Haplotype phasing in single-cell DNA-sequencing data

Autor:	Benjamin J. Raphael, Gryte Satas
Rok vydání:	2018
Předmět:	Male 0301 basic medicine Statistics and Probability Breast Neoplasms Single-nucleotide polymorphism Computational biology Biology Polymorphism Single Nucleotide Biochemistry Genome DNA sequencing 03 medical and health sciences Gene Frequency Humans Allele Molecular Biology Allele frequency Gene Ismb 2018–Intelligent Systems for Molecular Biology Proceedings Neurons Whole Genome Sequencing Genome Human Haplotype High-Throughput Nucleotide Sequencing Genomics Amplicon Diploidy Computer Science Applications Genomic Variation Analysis Computational Mathematics 030104 developmental biology Haplotypes Computational Theory and Mathematics Female Single-Cell Analysis Algorithms Software
Zdroj:	Bioinformatics
ISSN:	1367-4811 1367-4803
DOI:	10.1093/bioinformatics/bty286
Popis:	Motivation Current technologies for single-cell DNA sequencing require whole-genome amplification (WGA), as a single cell contains too little DNA for direct sequencing. Unfortunately, WGA introduces biases in the resulting sequencing data, including non-uniformity in genome coverage and high rates of allele dropout. These biases complicate many downstream analyses, including the detection of genomic variants. Results We show that amplification biases have a potential upside: long-range correlations in rates of allele dropout provide a signal for phasing haplotypes at the lengths of amplicons from WGA, lengths which are generally longer than than individual sequence reads. We describe a statistical test to measure concurrent allele dropout between single-nucleotide polymorphisms (SNPs) across multiple sequenced single cells. We use results of this test to perform haplotype assembly across a collection of single cells. We demonstrate that the algorithm predicts phasing between pairs of SNPs with higher accuracy than phasing from reads alone. Using whole-genome sequencing data from only seven neural cells, we obtain haplotype blocks that are orders of magnitude longer than with sequence reads alone (median length 10.2 kb versus 312 bp), with error rates Availability and implementation Source code is available at https://www.github.com/raphael-group. Supplementary information Supplementary data are available at Bioinformatics online.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::87ab6dbd6108b7db647b3bd47fdc0e62 https://doi.org/10.1093/bioinformatics/bty286 Zobrazit plný text záznamu