Predicting drug sensitivity of cancer cells based on DNA methylation levels

Autor: Julia L. Fleck, Sofia Pontes de Miranda, Stephen R. Piccolo, Fernanda Araujo Baião
Jazyk: angličtina
Rok vydání: 2021
Předmět:
Cancer Treatment
Biochemistry
Infographics
Machine Learning
Mathematical and Statistical Techniques
Feature (machine learning)
Medicine and Health Sciences
Data Management
Multidisciplinary
DNA methylation
Applied Mathematics
Simulation and Modeling
Statistics
Genomics
Chromatin
Random forest
Nucleic acids
Drug development
Oncology
Kernel (statistics)
Physical Sciences
Regression Analysis
Medicine
Epigenetics
DNA modification
Graphs
Algorithms
Chromatin modification
Research Article
Chromosome biology
Computer and Information Sciences
Cell biology
Science
Antineoplastic Agents
Computational biology
Biology
Research and Analysis Methods
Human Genomics
Artificial Intelligence
medicine
Genetics
Humans
Statistical Methods
Biology and life sciences
Data Visualization
Cancer
Cancers and Neoplasms
DNA
medicine.disease
Statistical classification
Gene expression
Mathematics
Zdroj: PLoS ONE, Vol 16, Iss 9, p e0238757 (2021)
PLoS ONE
ISSN: 1932-6203
Popis: Cancer cell lines, which are cell cultures derived from tumor samples, represent one of the least expensive and most studied preclinical models for drug development. Accurately predicting drug responses for a given cell line based on molecular features may help to optimize drug-development pipelines and explain mechanisms behind treatment responses. In this study, we focus on DNA methylation profiles as one type of molecular feature that is known to drive tumorigenesis and modulate treatment responses. Using genome-wide, DNA methylation profiles from 987 cell lines in the Genomics of Drug Sensitivity in Cancer database, we used machine-learning algorithms to evaluate the potential to predict cytotoxic responses for eight anti-cancer drugs. We compared the performance of five classification algorithms and four regression algorithms representing diverse methodologies, including tree-, probability-, kernel-, ensemble-, and distance-based approaches. We artificially subsampled the data to varying degrees, aiming to understand whether training based on relatively extreme outcomes would yield improved performance. When using classification or regression algorithms to predict discrete or continuous responses, respectively, we consistently observed excellent predictive performance when the training and test sets consisted of cell-line data. Classification algorithms performed best when we trained the models using cell lines with relatively extreme drug-response values, attaining area-under-the-receiver-operating-characteristic-curve values as high as 0.97. The regression algorithms performed best when we trained the models using the full range of drug-response values, although this depended on the performance metrics we used. Finally, we used patient data from The Cancer Genome Atlas to evaluate the feasibility of classifying clinical responses for human tumors based on models derived from cell lines. Generally, the algorithms were unable to identify patterns that predicted patient responses reliably; however, predictions by the Random Forests algorithm were significantly correlated with Temozolomide responses for low-grade gliomas.
Databáze: OpenAIRE