De novo detection of copy number variation by co-assembly
Autor: | Marcel J. T. Reinders, Marcel van den Broek, Jean-Marc Daran, Dick de Ridder, Jurgen F. Nijkamp, Jan-Maarten A. Geertman |
---|---|
Rok vydání: | 2012 |
Předmět: |
Statistics and Probability
Genetics DNA Copy Number Variations Contig Sequence analysis High-Throughput Nucleotide Sequencing Sequence assembly Genomics Sequence Analysis DNA Computational biology Biology Biochemistry Genome Computer Science Applications Saccharomyces Computational Mathematics Computational Theory and Mathematics Copy-number variation Genome Fungal Molecular Biology Algorithms Reference genome Integer (computer science) |
Zdroj: | Bioinformatics. 28:3195-3202 |
ISSN: | 1367-4811 1367-4803 |
DOI: | 10.1093/bioinformatics/bts601 |
Popis: | Motivation: Comparing genomes of individual organisms using next-generation sequencing data is, until now, mostly performed using a reference genome. This is challenging when the reference is distant and introduces bias towards the exact sequence present in the reference. Recent improvements in both sequencing read length and efficiency of assembly algorithms have brought direct comparison of individual genomes by de novo assembly, rather than through a reference genome, within reach. Results: Here, we develop and test an algorithm, named Magnolya, that uses a Poisson mixture model for copy number estimation of contigs assembled from sequencing data. We combine this with co-assembly to allow de novo detection of copy number variation (CNV) between two individual genomes, without mapping reads to a reference genome. In co-assembly, multiple sequencing samples are combined, generating a single contig graph with different traversal counts for the nodes and edges between the samples. In the resulting ‘coloured’ graph, the contigs have integer copy numbers; this negates the need to segment genomic regions based on depth of coverage, as required for mapping-based detection methods. Magnolya is then used to assign integer copy numbers to contigs, after which CNV probabilities are easily inferred. The copy number estimator and CNV detector perform well on simulated data. Application of the algorithms to hybrid yeast genomes showed allotriploid content from different origin in the wine yeast Y12, and extensive CNV in aneuploid brewing yeast genomes. Integer CNV was also accurately detected in a short-term laboratory-evolved yeast strain. Availability: Magnolya is implemented in Python and available at: http://bioinformatics.tudelft.nl/ Contact: d.deridder@tudelft.nl Supplementary information: Supplementary data are available at Bioinformatics online. |
Databáze: | OpenAIRE |
Externí odkaz: |