Whisper: read sorting allows robust mapping of DNA sequencing data
Autor: | Szymon Grabowski, Sebastian Deorowicz, Adam Gudyś, Agnieszka Debudaj-Grabysz |
---|---|
Rok vydání: | 2018 |
Předmět: |
Statistics and Probability
Data parallelism Computer science computer.software_genre Biochemistry DNA sequencing Reduction (complexity) 03 medical and health sciences Software Molecular Biology 030304 developmental biology 0303 health sciences Genome Base Sequence business.industry 030302 biochemistry & molecular biology Process (computing) Sorting High-Throughput Nucleotide Sequencing Sequence Analysis DNA Pipeline (software) Computer Science Applications Computational Mathematics Task (computing) Computational Theory and Mathematics Complementarity (molecular biology) Data mining business computer Algorithms Reference genome |
Zdroj: | Bioinformatics. 35:2043-2050 |
ISSN: | 1367-4811 1367-4803 |
DOI: | 10.1093/bioinformatics/bty927 |
Popis: | Motivation Mapping reads to a reference genome is often the first step in a sequencing data analysis pipeline. The reduction of sequencing costs implies a need for algorithms able to process increasing amounts of generated data in reasonable time. Results We present Whisper, an accurate and high-performant mapping tool, based on the idea of sorting reads and then mapping them against suffix arrays for the reference genome and its reverse complement. Employing task and data parallelism as well as storing temporary data on disk result in superior time efficiency at reasonable memory requirements. Whisper excels at large NGS read collections, in particular Illumina reads with typical WGS coverage. The experiments with real data indicate that our solution works in about 15% of the time needed by the well-known BWA-MEM and Bowtie2 tools at a comparable accuracy, validated in a variant calling pipeline. Availability and implementation Whisper is available for free from https://github.com/refresh-bio/Whisper or http://sun.aei.polsl.pl/REFRESH/Whisper/. Supplementary information Supplementary data are available at Bioinformatics online. |
Databáze: | OpenAIRE |
Externí odkaz: |