A statistical approach for tracking clonal dynamics in cancer using longitudinal next-generation sequencing data

Autor: Anna Schuh, Jenny C. Taylor, Dimitrios V Vavoulis, Anthony Cutts
Jazyk: angličtina
Rok vydání: 2020
Předmět:
Statistics and Probability
AcademicSubjects/SCI01060
Computer science
Computational biology
computer.software_genre
Genome
Biochemistry
Synthetic data
DNA sequencing
symbols.namesake
03 medical and health sciences
0302 clinical medicine
Neoplasms
Tumor Microenvironment
medicine
Humans
Exome
Liquid biopsy
Gaussian process
Molecular Biology
030304 developmental biology
Whole genome sequencing
0303 health sciences
Lymphocytic leukaemia
Whole Genome Sequencing
High-Throughput Nucleotide Sequencing
Cancer
Genome Analysis
medicine.disease
Original Papers
3. Good health
Computer Science Applications
Computational Mathematics
Cross-Sectional Studies
Computational Theory and Mathematics
030220 oncology & carcinogenesis
Mutation
Mutation (genetic algorithm)
Cancer cell
symbols
Data mining
Deconvolution
computer
Software
030217 neurology & neurosurgery
Zdroj: Bioinformatics
DOI: 10.1101/2020.01.20.913236
Popis: Tumours are composed of genotypically and phenotypically distinct cancer cell populations (clones), which are subject to a process of Darwinian evolution in response to changes in their local micro-environment, such as drug treatment. In a cancer patient, this process of continuous adaptation can be studied through next-generation sequencing of multiple tumour samples combined with appropriate bioinformatics and statistical methodologies. One family of statistical methods for clonal deconvolution seeks to identify groups of mutations and estimate the prevalence of each group in the tumour, while taking into account its purity and copy number profile. These methods have been used in the analysis of cross-sectional data, as well as for longitudinal data by discarding information on the timing of sample collection. Two key questions are how (in the case of longitudinal data) can we incorporate such information in our analyses and if there is any benefit in doing so. Regarding the first question, we incorporated information on the temporal spacing of longitudinally collected samples into standard non-parametric approaches for clonal deconvolution by modelling the time dependence of the prevalence of each clone as a Gaussian process. This permitted reconstruction of the temporal profile of the abundance of each clone continuously from several sparsely collected samples and without any strong prior assumptions on the functional form of this profile. Regarding the second question, we tested various model configurations on a range of whole genome, whole exome and targeted sequencing data from patients with chronic lymphocytic leukaemia, on liquid biopsy data from a patient with melanoma and on synthetic data. We demonstrate that incorporating temporal information in our analysis improves model performance, as long as data of sufficient volume and complexity are available for estimating free model parameters. We expect that our approach will be useful in cases where collecting a relatively long sequence of tumour samples is feasible, as in the case of liquid cancers (e.g. leukaemia) and liquid biopsies. The statistical methodology presented in this paper is freely available at github.com/dvav/clonosGP.
Databáze: OpenAIRE