Scavenger: A pipeline for recovery of unaligned reads utilising similarity with aligned reads
Autor: | Joshua W. K. Ho, Andrian Yang, Joshua Y. S. Tang, Michael Troup |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2019 |
Předmět: |
Read alignment
Computer science Pseudogene Pipeline (computing) 0206 medical engineering Read recovery 02 engineering and technology Computational biology Genome General Biochemistry Genetics and Molecular Biology 03 medical and health sciences 0302 clinical medicine Similarity (network science) Differential expression General Pharmacology Toxicology and Pharmaceutics 030304 developmental biology 0303 health sciences General Immunology and Microbiology Software Tool Article Genetic variants General Medicine Articles 030220 oncology & carcinogenesis RNA-seq Unaligned read 020602 bioinformatics Reference genome |
Zdroj: | F1000Research |
ISSN: | 2046-1402 |
Popis: | MotivationRead alignment is an important step in RNA-seq analysis as the result of alignment forms the basis for further downstream analyses. However, recent studies have shown that published alignment tools have variable mapping sensitivity and do not necessarily align reads which should have been aligned, a problem we termed as the false-negative non-alignment problem.ResultsWe have developed Scavenger, a pipeline for recovering unaligned reads using a novel mechanism which utilises information from aligned reads. Scavenger performs recovery of unaligned reads by re-aligning unaligned reads against a putative location derived from aligned reads with sequence similarity against unaligned reads. We show that Scavenger can successfully recover unaligned reads in both simulated and real RNA-seq datasets, including single-cell RNA-seq data. The reads recovered contain more genetic variants compared to previously aligned reads, indicating that divergence between personal and reference genomes plays a role in the false-negative non-alignment problem. We also explored the impact of read recovery on downstream analyses, in particular gene expression analysis, and showed that Scavenger is able to both recover genes which were previously non-expressed and also increase gene expression, with lowly expressed genes having the most impact from the addition of recovered reads. We also found that the majority of genes with >1 fold change in expression after recovery are categorised as pseudogenes, indicating that pseudogene expression can be affected by the false-negative non-alignment problem. Scavenger helps to solve the false-negative non-alignment problem through recovery of unaligned reads using information from previously aligned reads.AvailabilityScavenger is available via an open source license in https://github.com/VCCRI/Scavenger/Contactj.ho@victorchang.edu.au |
Databáze: | OpenAIRE |
Externí odkaz: |