Novel searching program for RNA homologous sequences based on Map-Reduce Framework

Autor: Ali Bekri, Abdelhakim El Fatmi, Said Benhlima
Rok vydání: 2019
Předmět:
Zdroj: BDIoT
DOI: 10.1145/3372938.3372986
Popis: Searching for homologues sequences is among the most important operations in RNA sequences (Ribonucleic Acid) analysis because of its essential role in the RNA structure prediction, identification of conserved motifs and domains, and phylogenetic analysis. It can also be necessary for finding the function of the new genes discovered in the laboratory and for which there is no information available in databases. In this work, we propose a new program for predicting the family of an unknown RNA sequence, based on its secondary structure. Since traditional methods of similar sequences searching require significant computation time to give good results considering the increasingly high number of sequences in databases, our program uses Map-Reduce Framework to process a very large amount of data in a reasonable processing time. Our database is created by gathering sequences from several available databases including Rfam, SRPDB, RCSB Protein Data Bank, tmRDB, and most of these sequences were obtained primarily from the Rfam database. The sequences are sorted by family and saved as well as its information in files (.txt). To evaluate the effectiveness of our program, we have performed tests using sequences selected randomly from Rfam database. The results obtained show that the program can give good results in all cases.
Databáze: OpenAIRE