BnpC: Bayesian non-parametric clustering of single-cell mutation profiles
Autor: | Abel Gonzalez-Perez, Nuria Lopez-Bigas, Jose Bonet, Francesco Marass, Nico Borgsmüller, Niko Beerenwinkel |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
Statistics and Probability
Non parametric clustering AcademicSubjects/SCI01060 Bayesian probability Cell Tumor initiation Computational biology Biology Biochemistry 03 medical and health sciences 0302 clinical medicine medicine Cluster Analysis Treatment resistance Molecular Biology 030304 developmental biology 0303 health sciences Sequence Analysis RNA Cancer Bayes Theorem medicine.disease Original Papers Computer Science Applications Computational Mathematics medicine.anatomical_structure Computational Theory and Mathematics 030220 oncology & carcinogenesis Mutation Mutation (genetic algorithm) Single-Cell Analysis Sequence Analysis Algorithms Software |
Zdroj: | Bioinformatics, 36 (19) Bioinformatics |
ISSN: | 1367-4803 1460-2059 |
Popis: | Motivation The high resolution of single-cell DNA sequencing (scDNA-seq) offers great potential to resolve intratumor heterogeneity (ITH) by distinguishing clonal populations based on their mutation profiles. However, the increasing size of scDNA-seq datasets and technical limitations, such as high error rates and a large proportion of missing values, complicate this task and limit the applicability of existing methods. Results Here, we introduce BnpC, a novel non-parametric method to cluster individual cells into clones and infer their genotypes based on their noisy mutation profiles. We benchmarked our method comprehensively against state-of-the-art methods on simulated data using various data sizes, and applied it to three cancer scDNA-seq datasets. On simulated data, BnpC compared favorably against current methods in terms of accuracy, runtime and scalability. Its inferred genotypes were the most accurate, especially on highly heterogeneous data, and it was the only method able to run and produce results on datasets with 5000 cells. On tumor scDNA-seq data, BnpC was able to identify clonal populations missed by the original cluster analysis but supported by Supplementary Experimental Data. With ever growing scDNA-seq datasets, scalable and accurate methods such as BnpC will become increasingly relevant, not only to resolve ITH but also as a preprocessing step to reduce data size. Availability and implementation BnpC is freely available under MIT license at https://github.com/cbg-ethz/BnpC. Supplementary information Supplementary data are available at Bioinformatics online. Bioinformatics, 36 (19) ISSN:1367-4803 ISSN:1460-2059 |
Databáze: | OpenAIRE |
Externí odkaz: |