Whisper: read sorting allows robust mapping of DNA sequencing data

Autor: Szymon Grabowski, Sebastian Deorowicz, Adam Gudyś, Agnieszka Debudaj-Grabysz
Rok vydání: 2018
Předmět:
Zdroj: Bioinformatics. 35:2043-2050
ISSN: 1367-4811
1367-4803
DOI: 10.1093/bioinformatics/bty927
Popis: Motivation Mapping reads to a reference genome is often the first step in a sequencing data analysis pipeline. The reduction of sequencing costs implies a need for algorithms able to process increasing amounts of generated data in reasonable time. Results We present Whisper, an accurate and high-performant mapping tool, based on the idea of sorting reads and then mapping them against suffix arrays for the reference genome and its reverse complement. Employing task and data parallelism as well as storing temporary data on disk result in superior time efficiency at reasonable memory requirements. Whisper excels at large NGS read collections, in particular Illumina reads with typical WGS coverage. The experiments with real data indicate that our solution works in about 15% of the time needed by the well-known BWA-MEM and Bowtie2 tools at a comparable accuracy, validated in a variant calling pipeline. Availability and implementation Whisper is available for free from https://github.com/refresh-bio/Whisper or http://sun.aei.polsl.pl/REFRESH/Whisper/. Supplementary information Supplementary data are available at Bioinformatics online.
Databáze: OpenAIRE