T-lex3: an accurate tool to genotype and estimate population frequencies of transposable elements using the latest short-read whole genome sequencing data
Autor: | Josep M. Casacuberta, Anna-Sophie Fiston-Lavier, Pol Vendrell-Mir, Raúl Castanera, Maite G. Barrón, María Bogaerts-Márquez, Josefa González |
---|---|
Přispěvatelé: | European Commission, Generalitat de Catalunya, Ministerio de Economía y Competitividad (España) |
Rok vydání: | 2019 |
Předmět: |
Statistics and Probability
Transposable element Genotype Population Computational biology Biology Biochemistry Genome 03 medical and health sciences 0302 clinical medicine Gene Frequency Genetic variation Animals Humans education Molecular Biology Genotyping 030304 developmental biology Whole genome sequencing 0303 health sciences education.field_of_study Whole Genome Sequencing Genetics and Population Analysis food and beverages Original Papers 3. Good health Computer Science Applications Computational Mathematics Identification (information) Drosophila melanogaster Computational Theory and Mathematics DNA Transposable Elements 030217 neurology & neurosurgery |
Zdroj: | Recercat. Dipósit de la Recerca de Catalunya instname Digital.CSIC. Repositorio Institucional del CSIC Bioinformatics Recercat: Dipósit de la Recerca de Catalunya Varias* (Consorci de Biblioteques Universitáries de Catalunya, Centre de Serveis Científics i Acadèmics de Catalunya) Dipòsit Digital de Documents de la UAB Universitat Autònoma de Barcelona |
ISSN: | 1367-4811 1367-4803 |
Popis: | [Motivation] Transposable elements (TEs) constitute a significant proportion of the majority of genomes sequenced to date. TEs are responsible for a considerable fraction of the genetic variation within and among species. Accurate genotyping of TEs in genomes is therefore crucial for a complete identification of the genetic differences among individuals, populations and species. [Results] In this work, we present a new version of T-lex, a computational pipeline that accurately genotypes and estimates the population frequencies of reference TE insertions using short-read high-throughput sequencing data. In this new version, we have re-designed the T-lex algorithm to integrate the BWA-MEM short-read aligner, which is one of the most accurate short-read mappers and can be launched on longer short-reads (e.g. reads >150 bp). We have added new filtering steps to increase the accuracy of the genotyping, and new parameters that allow the user to control both the minimum and maximum number of reads, and the minimum number of strains to genotype a TE insertion. We also showed for the first time that T-lex3 provides accurate TE calls in a plant genome. [Availability and implementation] To test the accuracy of T-lex3, we called 1630 individual TE insertions in Drosophila melanogaster, 1600 individual TE insertions in humans, and 3067 individual TE insertions in the rice genome. We showed that this new version of T-lex is a broadly applicable and accurate tool for genotyping and estimating TE frequencies in organisms with different genome sizes and different TE contents. T-lex3 is available at Github: https://github.com/GonzalezLab/T-lex3. This work was supported by the European Commission (H2020-ERC-2014-CoG-647900) and by the Secretaria d’Universitats i Recerca del Departament d’Economia i Coneixement de la Generalitat de Catalunya (GRC 2017 SGR 880). Work done at CRAG was partially funded by a grant from the Ministerio de Economia y Competitividad (AGL2016-78992-R). |
Databáze: | OpenAIRE |
Externí odkaz: |