Novel searching program for RNA homologous sequences based on Map-Reduce Framework
Autor: | Ali Bekri, Abdelhakim El Fatmi, Said Benhlima |
---|---|
Rok vydání: | 2019 |
Předmět: |
0303 health sciences
Phylogenetic tree Computer science Protein Data Bank (RCSB PDB) RNA Rfam Computational biology computer.file_format Protein Data Bank Homologous Sequences 03 medical and health sciences Identification (information) 0302 clinical medicine Gene computer 030217 neurology & neurosurgery 030304 developmental biology |
Zdroj: | BDIoT |
DOI: | 10.1145/3372938.3372986 |
Popis: | Searching for homologues sequences is among the most important operations in RNA sequences (Ribonucleic Acid) analysis because of its essential role in the RNA structure prediction, identification of conserved motifs and domains, and phylogenetic analysis. It can also be necessary for finding the function of the new genes discovered in the laboratory and for which there is no information available in databases. In this work, we propose a new program for predicting the family of an unknown RNA sequence, based on its secondary structure. Since traditional methods of similar sequences searching require significant computation time to give good results considering the increasingly high number of sequences in databases, our program uses Map-Reduce Framework to process a very large amount of data in a reasonable processing time. Our database is created by gathering sequences from several available databases including Rfam, SRPDB, RCSB Protein Data Bank, tmRDB, and most of these sequences were obtained primarily from the Rfam database. The sequences are sorted by family and saved as well as its information in files (.txt). To evaluate the effectiveness of our program, we have performed tests using sequences selected randomly from Rfam database. The results obtained show that the program can give good results in all cases. |
Databáze: | OpenAIRE |
Externí odkaz: |