TranscriptClean: variant-aware correction of indels, mismatches and splice junctions in long-read transcripts
Autor: | Dana Wyman, Ali Mortazavi |
---|---|
Rok vydání: | 2018 |
Předmět: |
Statistics and Probability
Computer science Gene Expression Computational biology Transcript isoforms Biochemistry Genome 03 medical and health sciences Exon INDEL Mutation Humans Protein Isoforms splice Indel Molecular Biology 030304 developmental biology 0303 health sciences Extramural 030302 biochemistry & molecular biology Computational Biology Exons Applications Notes Computer Science Applications Computational Mathematics Computational Theory and Mathematics Software Reference genome |
Zdroj: | Bioinformatics |
ISSN: | 1367-4811 1367-4803 |
Popis: | Motivation Long-read, single-molecule sequencing platforms hold great potential for isoform discovery and characterization of multi-exon transcripts. However, their high error rates are an obstacle to distinguishing novel transcript isoforms from sequencing artifacts. Therefore, we developed the package TranscriptClean to correct mismatches, microindels and noncanonical splice junctions in mapped transcripts using the reference genome while preserving known variants. Results Our method corrects nearly all mismatches and indels present in a publically available human PacBio Iso-seq dataset, and rescues 39% of noncanonical splice junctions. Availability and implementation All Python and R scripts used in this paper are available at https://github.com/dewyman/TranscriptClean. |
Databáze: | OpenAIRE |
Externí odkaz: |