Estimating optimal window size for analysis of low-coverage next-generation sequence data
Autor: | Ibrahim Nafisah, Charles C. Taylor, Henry M. Wood, Stefano Berri, Arief Gusnanto, Pamela Rabbitts |
---|---|
Rok vydání: | 2014 |
Předmět: |
Statistics and Probability
Lung Neoplasms Computer science Context (language use) computer.software_genre Biochemistry Humans Molecular Biology Likelihood Functions Sequence Genome Human High-Throughput Nucleotide Sequencing Window (computing) Contrast (statistics) Genomics Sequence Analysis DNA Function (mathematics) Computer Science Applications Computational Mathematics Computational Theory and Mathematics Step function Data mining Akaike information criterion computer Algorithm Next generation sequence |
Zdroj: | Bioinformatics. 30:1823-1829 |
ISSN: | 1367-4811 1367-4803 |
Popis: | Motivation: Current high-throughput sequencing has greatly transformed genome sequence analysis. In the context of very low-coverage sequencing ( Results: We assume the reads density to be a step function. Given this model, we propose a data-based estimation of optimal window size based on Akaike’s information criterion (AIC) and cross-validation (CV) log-likelihood. By plotting the AIC and CV log-likelihood curve as a function of window size, we are able to estimate the optimal window size that minimizes AIC or maximizes CV log-likelihood. The proposed methods are of general purpose and we illustrate their application using low-coverage next-generation sequence datasets from real tumour samples and simulated datasets. Availability and implementation: An R package to estimate optimal window size is available at http://www1.maths.leeds.ac.uk/∼arief/R/win/ . Contact: a.gusnanto@leeds.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. |
Databáze: | OpenAIRE |
Externí odkaz: |