Aberration-corrected ultrafine analysis of miRNA reads at single-base resolution: a k-mer lattice approach
Autor: | Pengyao Ping, Michael Blumenstein, Xuan Zhang, Gyorgy Hutvagner, Jinyan Li |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2021 |
Předmět: |
Gene isoform
AcademicSubjects/SCI00010 Word error rate Computational biology Biology 03 medical and health sciences 0302 clinical medicine IsomiR Salmon Databases Genetic Genetics Animals Humans Indel 05 Environmental Sciences 06 Biological Sciences 08 Information and Computing Sciences Narese/9 030304 developmental biology 0303 health sciences Computational Biology High-Throughput Nucleotide Sequencing Sequence Analysis DNA Base (topology) MicroRNAs MRNA Sequencing k-mer 030220 oncology & carcinogenesis Methods Online Error detection and correction Algorithms Developmental Biology |
Zdroj: | Nucleic Acids Research |
ISSN: | 1362-4962 0305-1048 |
Popis: | Raw sequencing reads of miRNAs contain machine-made substitution errors, or even insertions and deletions (indels). Although the error rate can be low at 0.1%, precise rectification of these errors is critically important because isoform variation analysis at single-base resolution such as novel isomiR discovery, editing events understanding, differential expression analysis, or tissue-specific isoform identification is very sensitive to base positions and copy counts of the reads. Existing error correction methods do not work for miRNA sequencing data attributed to miRNAs’ length and per-read-coverage properties distinct from DNA or mRNA sequencing reads. We present a novel lattice structure combining kmers, (k – 1)mers and (k + 1)mers to address this problem. The method is particularly effective for the correction of indel errors. Extensive tests on datasets having known ground truth of errors demonstrate that the method is able to remove almost all of the errors, without introducing any new error, to improve the data quality from every-50-reads containing one error to every-1300-reads containing one error. Studies on experimental miRNA sequencing datasets show that the errors are often rectified at the 5′ ends and the seed regions of the reads, and that there are remarkable changes after the correction in miRNA isoform abundance, volume of singleton reads, overall entropy, isomiR families, tissue-specific miRNAs, and rare-miRNA quantities. |
Databáze: | OpenAIRE |
Externí odkaz: |