Covering all your bases: incorporating intron signal from RNA-seq data
Autor: | Aliaksei Holik, Charity W. Law, Albert Y Zhang, Matthew E. Ritchie, Stuart Lee, Ashley P. Ng, Marie Liesse Asselin-Labat, Shian Su |
---|---|
Rok vydání: | 2020 |
Předmět: |
AcademicSubjects/SCI01140
0303 health sciences AcademicSubjects/SCI01060 Mature messenger RNA AcademicSubjects/SCI00030 SIGNAL (programming language) Intron RNA RNA-Seq Standard Article Computational biology Biology AcademicSubjects/SCI01180 03 medical and health sciences Exon Exploratory data analysis 0302 clinical medicine 030220 oncology & carcinogenesis Human genome AcademicSubjects/SCI00980 Gene 030217 neurology & neurosurgery Index method 030304 developmental biology |
Zdroj: | NAR Genomics and Bioinformatics |
ISSN: | 2631-9268 |
DOI: | 10.1093/nargab/lqaa073 |
Popis: | RNA-seq datasets can contain millions of intron reads per library that are typically removed from downstream analysis. Only reads overlapping annotated exons are considered to be informative since mature mRNA is assumed to be the major component sequenced, especially for poly(A) RNA libraries. In this study, we show that intron reads are informative, and through exploratory data analysis of read coverage that intron signal is representative of both pre-mRNAs and intron retention. We demonstrate how intron reads can be utilized in differential expression analysis using our index method where a unique set of differentially expressed genes can be detected using intron counts. In exploring read coverage, we also developed the superintronic software that quickly and robustly calculates user-defined summary statistics for exonic and intronic regions. Across multiple datasets, superintronic enabled us to identify several genes with distinctly retained introns that had similar coverage levels to that of neighbouring exons. The work and ideas presented in this paper is the first of its kind to consider multiple biological sources for intron reads through exploratory data analysis, minimizing bias in discovery and interpretation of results. Our findings open up possibilities for further methods development for intron reads and RNA-seq data in general. |
Databáze: | OpenAIRE |
Externí odkaz: |