miRBoost: Boosting support vector machines for microRNA precursor classification
Autor: | Farida Zehraoui, Fariza Tahi, Van Du T. Tran, Sébastien Tempel, Benjamin Zerath |
---|---|
Přispěvatelé: | Informatique, Biologie Intégrative et Systèmes Complexes (IBISC), Université d'Évry-Val-d'Essonne (UEVE), Swiss Institute of Bioinformatics [Lausanne] (SIB), Université de Lausanne = University of Lausanne (UNIL), Laboratoire de chimie bactérienne (LCB), Aix Marseille Université (AMU)-Centre National de la Recherche Scientifique (CNRS), Université de Lausanne (UNIL) |
Jazyk: | angličtina |
Rok vydání: | 2015 |
Předmět: |
Web server
Boosting (machine learning) Support Vector Machine Bioinformatics In silico Information Storage and Retrieval Feature selection Biology Positive data Machine learning computer.software_genre Sensitivity and Specificity Software Databases Genetic RNA Precursors Animals Humans Molecular Biology Training set business.industry Computational Biology Support vector machine MicroRNAs Artificial intelligence [INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM] business computer Sequence Alignment |
Zdroj: | RNA RNA, 2015, 21 (5), pp.775--785. ⟨10.1261/rna.043612.113⟩ RNA, Cold Spring Harbor Laboratory Press, 2015, 21 (5), pp.775--785. ⟨10.1261/rna.043612.113⟩ |
ISSN: | 1355-8382 1469-9001 |
DOI: | 10.1261/rna.043612.113⟩ |
Popis: | Identification of microRNAs (miRNAs) is an important step toward understanding post-transcriptional gene regulation and miRNA-related pathology. Difficulties in identifying miRNAs through experimental techniques combined with the huge amount of data from new sequencing technologies have made in silico discrimination of bona fide miRNA precursors from non-miRNA hairpin-like structures an important topic in bioinformatics. Among various techniques developed for this classification problem, machine learning approaches have proved to be the most promising. However these approaches require the use of training data, which is problematic due to an imbalance in the number of miRNAs (positive data) and non-miRNAs (negative data), which leads to a degradation of their performance. In order to address this issue, we present an ensemble method that uses a boosting technique with support vector machine components to deal with imbalanced training data. Classification is performed following a feature selection on 187 novel and existing features. The algorithm, miRBoost, performed better in comparison with state-of-the-art methods on imbalanced human and cross-species data. It also showed the highest ability among the tested methods for discovering novel miRNA precursors. In addition, miRBoost was over 1400 times faster than the second most accurate tool tested and was significantly faster than most of the other tools. miRBoost thus provides a good compromise between prediction efficiency and execution time, making it highly suitable for use in genome-wide miRNA precursor prediction. The software miRBoost is available on our web server http://EvryRNA.ibisc.univ-evry.fr. |
Databáze: | OpenAIRE |
Externí odkaz: |
Pro tento záznam nejsou dostupné žádné jednotky.