Epiclomal: probabilistic clustering of sparse single-cell DNA methylation data
Autor: | Daniel Lai, Camila P. E. de Souza, Michelle Moksa, Edmund Su, Sohrab P. Shah, Tony Hui, Martin Hirst, Qi Cao, Beixi Wang, Patricia Ye, Samuel Aparicio, Emma Laks, Farhia Kabeer, Jazmine Brimhall, Mirela Andronescu, Tehmina Masud, Marcus Wong, Richard A. Moore, Justina Biele |
---|---|
Rok vydání: | 2018 |
Předmět: |
0301 basic medicine
Computer science Probabilistic clustering Biochemistry Genome Mathematical and Statistical Techniques 0302 clinical medicine Breast Tumors Basic Cancer Research Medicine and Health Sciences Cluster Analysis Biology (General) computer.programming_language DNA methylation Sex Chromosomes Ecology Chemical Reactions X Chromosomes Genomics Methylation Chromatin Nucleic acids Chemistry Oncology Computational Theory and Mathematics CpG site Modeling and Simulation Physical Sciences Epigenetics Single-Cell Analysis DNA modification Chromatin modification Research Article Chromosome biology Cell biology QH301-705.5 Computational biology Research and Analysis Methods Chromosomes 03 medical and health sciences Cellular and Molecular Neuroscience Cancer Genomics Genomic Medicine Breast Cancer Genetics Humans Molecular Biology Techniques Hierarchical Clustering Molecular Biology Ecology Evolution Behavior and Systematics Probability Biology and life sciences Cancers and Neoplasms Sequence Analysis DNA DNA Python (programming language) Missing data Mixture model Hierarchical clustering 030104 developmental biology CpG Islands Gene expression computer 030217 neurology & neurosurgery Cloning |
Zdroj: | PLoS Computational Biology PLoS Computational Biology, Vol 16, Iss 9, p e1008270 (2020) |
DOI: | 10.1101/414482 |
Popis: | We present Epiclomal, a probabilistic clustering method arising from a hierarchical mixture model to simultaneously cluster sparse single-cell DNA methylation data and impute missing values. Using synthetic and published single-cell CpG datasets, we show that Epiclomal outperforms non-probabilistic methods and can handle the inherent missing data characteristic that dominates single-cell CpG genome sequences. Using newly generated single-cell 5mCpG sequencing data, we show that Epiclomal discovers sub-clonal methylation patterns in aneuploid tumour genomes, thus defining epiclones that can match or transcend copy number-determined clonal lineages and opening up an important form of clonal analysis in cancer. Epiclomal is written in R and Python and is available at https://github.com/shahcompbio/Epiclomal. Author summary DNA methylation is an epigenetic mark that occurs when methyl groups are attached to the DNA molecule, thereby playing decisive roles in numerous biological processes. Advances in technology have allowed the generation of high-throughput DNA methylation sequencing data from single cells. One of the goals is to group cells according to their DNA methylation profiles; however, a major challenge arises due to a large amount of missing data per cell. To address this problem, we developed a novel statistical model and framework: Epiclomal. Our approach uses a hierarchical mixture model to borrow statistical strength across cells and neighboring loci to accurately define cell groups (clusters). We compare our approach to different methods on both synthetic and published datasets. We show that Epiclomal is more robust than other approaches, producing more accurate clusters of cells in the majority of experimental scenarios. We also apply Epiclomal to newly generated single-cell DNA methylation data from breast cancer xenografts. Our results show that methylation-based clusters can mirror or in some instances transcend the clusters defined by single-cell copy number analysis. This illustrates the importance of single-cell DNA methylation analysis in understanding cellular heterogeneity in cancer. |
Databáze: | OpenAIRE |
Externí odkaz: |