ClassifyTE: A stacking based prediction of hierarchical classification of transposable elements
Autor: | Tamjidul Hoque, Avdesh Mishra, Manisha Panta, Joel Atallah |
---|---|
Rok vydání: | 2020 |
Předmět: |
Statistics and Probability
Transposable element Source code Computer science media_common.quotation_subject 02 engineering and technology Computational biology medicine.disease_cause Biochemistry Somatic evolution in cancer Genome DNA sequencing Germline 03 medical and health sciences 0202 electrical engineering electronic engineering information engineering medicine Taxonomic rank Molecular Biology 030304 developmental biology media_common 0303 health sciences Mutation Computer Science Applications Computational Mathematics Computational Theory and Mathematics Mutation (genetic algorithm) Benchmark (computing) 020201 artificial intelligence & image processing |
Zdroj: | Bioinformatics (Oxford, England). |
ISSN: | 1367-4811 |
Popis: | Motivation Transposable Elements (TEs) or jumping genes are DNA sequences that have an intrinsic capability to move within a host genome from one genomic location to another. Studies show that the presence of a TE within or adjacent to a functional gene may alter its expression. TEs can also cause an increase in the rate of mutation and can even mediate duplications and large insertions and deletions in the genome, promoting gross genetic rearrangements. The proper classification of identified jumping genes is important for analyzing their genetic and evolutionary effects. An effective classifier, which can explain the role of TEs in germline and somatic evolution more accurately, is needed. In this study, we examine the performance of a variety of machine learning (ML) techniques and propose a robust method, ClassifyTE, for the hierarchical classification of TEs with high accuracy, using a stacking-based ML method. Results We propose a stacking-based approach for the hierarchical classification of TEs. When trained on three different benchmark datasets, our proposed system achieved 4%, 10.68% and 10.13% average percentage improvement (using the hF measure) compared to several state-of-the-art methods. We developed an end-to-end automated hierarchical classification tool based on the proposed approach, ClassifyTE, to classify TEs up to the super-family level. We further evaluated our method on a new TE library generated by a homology-based classification method and found relatively high concordance at higher taxonomic levels. Thus, ClassifyTE paves the way for a more accurate analysis of the role of TEs. Availability and implementation The source code and data are available at https://github.com/manisa/ClassifyTE. Supplementary information Supplementary data are available at Bioinformatics online. |
Databáze: | OpenAIRE |
Externí odkaz: |