A statistical approach for tracking clonal dynamics in cancer using longitudinal next-generation sequencing data
Autor: | Anna Schuh, Jenny C. Taylor, Dimitrios V Vavoulis, Anthony Cutts |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
Statistics and Probability
AcademicSubjects/SCI01060 Computer science Computational biology computer.software_genre Genome Biochemistry Synthetic data DNA sequencing symbols.namesake 03 medical and health sciences 0302 clinical medicine Neoplasms Tumor Microenvironment medicine Humans Exome Liquid biopsy Gaussian process Molecular Biology 030304 developmental biology Whole genome sequencing 0303 health sciences Lymphocytic leukaemia Whole Genome Sequencing High-Throughput Nucleotide Sequencing Cancer Genome Analysis medicine.disease Original Papers 3. Good health Computer Science Applications Computational Mathematics Cross-Sectional Studies Computational Theory and Mathematics 030220 oncology & carcinogenesis Mutation Mutation (genetic algorithm) Cancer cell symbols Data mining Deconvolution computer Software 030217 neurology & neurosurgery |
Zdroj: | Bioinformatics |
DOI: | 10.1101/2020.01.20.913236 |
Popis: | Tumours are composed of genotypically and phenotypically distinct cancer cell populations (clones), which are subject to a process of Darwinian evolution in response to changes in their local micro-environment, such as drug treatment. In a cancer patient, this process of continuous adaptation can be studied through next-generation sequencing of multiple tumour samples combined with appropriate bioinformatics and statistical methodologies. One family of statistical methods for clonal deconvolution seeks to identify groups of mutations and estimate the prevalence of each group in the tumour, while taking into account its purity and copy number profile. These methods have been used in the analysis of cross-sectional data, as well as for longitudinal data by discarding information on the timing of sample collection. Two key questions are how (in the case of longitudinal data) can we incorporate such information in our analyses and if there is any benefit in doing so. Regarding the first question, we incorporated information on the temporal spacing of longitudinally collected samples into standard non-parametric approaches for clonal deconvolution by modelling the time dependence of the prevalence of each clone as a Gaussian process. This permitted reconstruction of the temporal profile of the abundance of each clone continuously from several sparsely collected samples and without any strong prior assumptions on the functional form of this profile. Regarding the second question, we tested various model configurations on a range of whole genome, whole exome and targeted sequencing data from patients with chronic lymphocytic leukaemia, on liquid biopsy data from a patient with melanoma and on synthetic data. We demonstrate that incorporating temporal information in our analysis improves model performance, as long as data of sufficient volume and complexity are available for estimating free model parameters. We expect that our approach will be useful in cases where collecting a relatively long sequence of tumour samples is feasible, as in the case of liquid cancers (e.g. leukaemia) and liquid biopsies. The statistical methodology presented in this paper is freely available at github.com/dvav/clonosGP. |
Databáze: | OpenAIRE |
Externí odkaz: |