CoCo: RNA-seq read assignment correction for nested genes and multimapped reads.
Autor: | Deschamps-Francoeur G; Department of Biochemistry and RNA Group, Université de Sherbrooke, Sherbrooke, QC, Canada., Boivin V; Department of Biochemistry and RNA Group, Université de Sherbrooke, Sherbrooke, QC, Canada., Abou Elela S; Department of Microbiology and Infectiology, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, QC, Canada., Scott MS; Department of Biochemistry and RNA Group, Université de Sherbrooke, Sherbrooke, QC, Canada. |
---|---|
Jazyk: | angličtina |
Zdroj: | Bioinformatics (Oxford, England) [Bioinformatics] 2019 Dec 01; Vol. 35 (23), pp. 5039-5047. |
DOI: | 10.1093/bioinformatics/btz433 |
Abstrakt: | Motivation: Next-generation sequencing techniques revolutionized the study of RNA expression by permitting whole transcriptome analysis. However, sequencing reads generated from nested and multi-copy genes are often either misassigned or discarded, which greatly reduces both quantification accuracy and gene coverage. Results: Here we present count corrector (CoCo), a read assignment pipeline that takes into account the multitude of overlapping and repetitive genes in the transcriptome of higher eukaryotes. CoCo uses a modified annotation file that highlights nested genes and proportionally distributes multimapped reads between repeated sequences. CoCo salvages over 15% of discarded aligned RNA-seq reads and significantly changes the abundance estimates for both coding and non-coding RNA as validated by PCR and bedgraph comparisons. Availability and Implementation: The CoCo software is an open source package written in Python and available from http://gitlabscottgroup.med.usherbrooke.ca/scott-group/coco. Supplementary Information: Supplementary data are available at Bioinformatics online. (© The Author(s) 2019. Published by Oxford University Press.) |
Databáze: | MEDLINE |
Externí odkaz: |