Specificity control for read alignments using an artificial reference genome-guided false discovery rate

Autor: Sven H. Giese, Bernhard Y. Renard, Franziska Zickmann
Rok vydání: 2013
Předmět:
Zdroj: Bioinformatics. 30:9-16
ISSN: 1367-4811
1367-4803
DOI: 10.1093/bioinformatics/btt255
Popis: Motivation: Accurate estimation, comparison and evaluation of read mapping error rates is a crucial step in the processing of next-generation sequencing data, as further analysis steps and interpretation assume the correctness of the mapping results. Current approaches are either focused on sensitivity estimation and thereby disregard specificity or are based on read simulations. Although continuously improving, read simulations are still prone to introduce a bias into the mapping error quantitation and cannot capture all characteristics of an individual dataset. Results: We introduce ARDEN (artificial reference driven estimation of false positives in next-generation sequencing data), a novel benchmark method that estimates error rates of read mappers based on real experimental reads, using an additionally generated artificial reference genome. It allows a dataset-specific computation of error rates and the construction of a receiver operating characteristic curve. Thereby, it can be used for optimization of parameters for read mappers, selection of read mappers for a specific problem or for filtering alignments based on quality estimation. The use of ARDEN is demonstrated in a general read mapper comparison, a parameter optimization for one read mapper and an application example in single-nucleotide polymorphism discovery with a significant reduction in the number of false positive identifications. Availability: The ARDEN source code is freely available at http://sourceforge.net/projects/arden/. Contact: renardb@rki.de Supplementary information: Supplementary data are available at Bioinformatics online.
Databáze: OpenAIRE