Specificity control for read alignments using an artificial reference genome-guided false discovery rate

Autor:	Sven H. Giese, Bernhard Y. Renard, Franziska Zickmann
Rok vydání:	2013
Předmět:	Statistics and Probability False discovery rate Source code Computer science media_common.quotation_subject computer.software_genre Polymorphism Single Nucleotide Biochemistry Genome Reduction (complexity) False positive paradox Animals Amino Acid Sequence Sensitivity (control systems) Caenorhabditis elegans Molecular Biology Peptide sequence media_common Base Sequence High-Throughput Nucleotide Sequencing Computer Science Applications Computational Mathematics Computational Theory and Mathematics Benchmark (computing) Data mining computer Algorithms Software Reference genome
Zdroj:	Bioinformatics. 30:9-16
ISSN:	1367-4811 1367-4803
DOI:	10.1093/bioinformatics/btt255
Popis:	Motivation: Accurate estimation, comparison and evaluation of read mapping error rates is a crucial step in the processing of next-generation sequencing data, as further analysis steps and interpretation assume the correctness of the mapping results. Current approaches are either focused on sensitivity estimation and thereby disregard specificity or are based on read simulations. Although continuously improving, read simulations are still prone to introduce a bias into the mapping error quantitation and cannot capture all characteristics of an individual dataset. Results: We introduce ARDEN (artificial reference driven estimation of false positives in next-generation sequencing data), a novel benchmark method that estimates error rates of read mappers based on real experimental reads, using an additionally generated artificial reference genome. It allows a dataset-specific computation of error rates and the construction of a receiver operating characteristic curve. Thereby, it can be used for optimization of parameters for read mappers, selection of read mappers for a specific problem or for filtering alignments based on quality estimation. The use of ARDEN is demonstrated in a general read mapper comparison, a parameter optimization for one read mapper and an application example in single-nucleotide polymorphism discovery with a significant reduction in the number of false positive identifications. Availability: The ARDEN source code is freely available at http://sourceforge.net/projects/arden/. Contact: renardb@rki.de Supplementary information: Supplementary data are available at Bioinformatics online.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::0273a44ce3c139bfff77319abee7ae5c https://doi.org/10.1093/bioinformatics/btt255 Zobrazit plný text záznamu