Machine learning enables detection of early-stage colorectal cancer by whole-genome sequencing of plasma cell-free DNA

Autor: Abraham Tzou, Jennifer Pecson, Tzu-Yu Liu, Signe Fransen, John St. John, David E. Weinberg, Riley Ennis, Yaping Liu, Brandon J. Rice, Daniel Delubac, Nathan Boley, Marvin Bertin, Katherine E. Niehaus, Leilani Young, Aarushi Sharma, Girish Putcha, Adam Drake, James Cregg, Erik Gafni, Nathan Wan, Catherina Tang, Derek Bowen, Brandon White, Imran S. Haque, Ajay Kannan, Mitch Bailey, Gabriel E. Sanderson, Eric A. Ariazi, Gabriel Otte, Loren Hansen
Rok vydání: 2019
Předmět:
0301 basic medicine
Male
Cancer Research
Colorectal cancer
Plasma cell
computer.software_genre
Circulating Tumor DNA
Machine Learning
Cell-free DNA
0302 clinical medicine
Surgical oncology
Tumor stage
Medicine
Early-stage cancer
Aged
80 and over

0303 health sciences
Confounding
Genomics
Middle Aged
lcsh:Neoplasms. Tumors. Oncology. Including cancer and carcinogens
medicine.anatomical_structure
Oncology
Cell-free fetal DNA
030220 oncology & carcinogenesis
Cohort
Screening
Female
Colorectal Neoplasms
Research Article
Early detection
Machine learning
lcsh:RC254-282
Free dna
03 medical and health sciences
Text mining
Genetics
Biomarkers
Tumor

Humans
030304 developmental biology
Aged
Neoplasm Staging
Whole genome sequencing
Whole-genome sequencing
business.industry
Genome
Human

Gene Expression Profiling
Computational Biology
Reproducibility of Results
medicine.disease
030104 developmental biology
ROC Curve
Artificial intelligence
business
Transcriptome
computer
Zdroj: BMC Cancer
BMC Cancer, Vol 19, Iss 1, Pp 1-10 (2019)
ISSN: 1471-2407
Popis: BackgroundBlood-based methods using cell-free DNA (cfDNA) are under development as an alternative to existing screening tests. However, early-stage detection of cancer using tumor-derived cfDNA has proven challenging because of the small proportion of cfDNA derived from tumor tissue in early-stage disease. A machine learning approach to discover signatures in cfDNA, potentially reflective of both tumor and non-tumor contributions, may represent a promising direction for the early detection of cancer.MethodsWhole-genome sequencing was performed on cfDNA extracted from plasma samples (N=546 colorectal cancer and 271 non-cancer controls). Reads aligning to protein-coding gene bodies were extracted, and read counts were normalized. cfDNA tumor fraction was estimated using IchorCNA. Machine learning models were trained using k-fold cross-validation and confounder-based cross-validation to assess generalization performance.ResultsIn a colorectal cancer cohort heavily weighted towards early-stage cancer (80% stage I/II), we achieved a mean AUC of 0.92 (95% CI 0.91-0.93) with a mean sensitivity of 85% (95% CI 83-86%) at 85% specificity. Sensitivity generally increased with tumor stage and increasing tumor fraction. Stratification by age, sequencing batch, and institution demonstrated the impact of these confounders and provided a more accurate assessment of generalization performance.ConclusionsA machine learning approach using cfDNA achieved high sensitivity and specificity in a large, predominantly early-stage, colorectal cancer cohort. The possibility of systematic technical and institution-specific biases warrants similar confounder analyses in other studies. Prospective validation of this machine learning method and evaluation of a multi-analyte approach are underway.
Databáze: OpenAIRE
Nepřihlášeným uživatelům se plný text nezobrazuje