mirExplorer: Detecting microRNAs from genome and next generation sequencing data using the AdaBoost method with transition probability matrix and combined features
Autor: | Zhen-Hua Qu, Daogang Guan, Liang-Hu Qu, Ying Zhang, Jian-You Liao |
---|---|
Rok vydání: | 2011 |
Předmět: |
Genetics
Whole genome sequencing Boosting (machine learning) Base Sequence Sequence Analysis RNA Sequence analysis Chromosome Mapping Computational Biology High-Throughput Nucleotide Sequencing Cell Biology Computational biology Biology Genome DNA sequencing Set (abstract data type) MicroRNAs Animals Humans AdaBoost Molecular Biology Test data |
Zdroj: | RNA Biology. 8:922-934 |
ISSN: | 1555-8584 1547-6286 |
DOI: | 10.4161/rna.8.5.16026 |
Popis: | microRNAs (miRNAs) represent an abundant group of small regulatory non-coding RNAs in eukaryotes. The emergence of Next-generation sequencing (NGS) technologies has allowed the systematic detection of small RNAs (sRNAs) and de novo sequencing of genomes quickly and with low cost. As a result, there is an increased need to develop fast miRNA prediction tools to annotate miRNAs from various organisms with a high level of accuracy, using the genome sequence or the NGS data. Several miRNA predictors have been proposed to achieve this purpose. However, the accuracy and fitness for multiple species of existing predictors needed to be improved. Here, we present a novel prediction tool called mirExplorer, which is based on an integrated adaptive boosting method and contains two modules. The first module named mirExplorer-genome was designed to de novo predict pre-miRNAs from genome, and the second module named mirExplorer-NGS was used to discover miRNAs from NGS data. A set of novel features of pre-miRNA secondary structure and miRNA biogenesis has been extracted to distinguish real pre-miRNAs from pseudo ones. We used outer-ten-fold cross-validation to verify the mirExplorer-genome computation, which obtained a specificity of 95.03% and a sensitivity of 93.71% on human data. This computation was made on test data from 16 species, and it achieved an overall accuracy of 95.53%. Systematic outer-ten-fold cross-validation of the mirExplorer-NGS model achieved a specificity of 98.3% and a sensitivity of 97.72%. We found that the good performance of the mirExplorer-NGS model was upheld across species from vertebrates to plants in test datasets. The mirExplorer is available as both web server and software package at http://biocenter.sysu.edu.cn/mir/. |
Databáze: | OpenAIRE |
Externí odkaz: |