Estimating optimal window size for analysis of low-coverage next-generation sequence data

Autor: Ibrahim Nafisah, Charles C. Taylor, Henry M. Wood, Stefano Berri, Arief Gusnanto, Pamela Rabbitts
Rok vydání: 2014
Předmět:
Zdroj: Bioinformatics. 30:1823-1829
ISSN: 1367-4811
1367-4803
Popis: Motivation: Current high-throughput sequencing has greatly transformed genome sequence analysis. In the context of very low-coverage sequencing ( Results: We assume the reads density to be a step function. Given this model, we propose a data-based estimation of optimal window size based on Akaike’s information criterion (AIC) and cross-validation (CV) log-likelihood. By plotting the AIC and CV log-likelihood curve as a function of window size, we are able to estimate the optimal window size that minimizes AIC or maximizes CV log-likelihood. The proposed methods are of general purpose and we illustrate their application using low-coverage next-generation sequence datasets from real tumour samples and simulated datasets. Availability and implementation: An R package to estimate optimal window size is available at http://www1.maths.leeds.ac.uk/∼arief/R/win/ . Contact: a.gusnanto@leeds.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.
Databáze: OpenAIRE