Feature selection strategies for drug sensitivity prediction

Autor: Eike Staub, Krzysztof Koras, Julian Kreis, Dilafruz Juraeva, Ewa Szczurek, Johanna Mazur
Jazyk: angličtina
Rok vydání: 2020
Předmět:
Support Vector Machine
Computer science
Datasets as Topic
lcsh:Medicine
0302 clinical medicine
Neoplasms
Oximes
Gene expression
Computational models
Molecular Targeted Therapy
Precision Medicine
lcsh:Science
media_common
Interpretability
0303 health sciences
Multidisciplinary
Imidazoles
Prognosis
3. Good health
Feature (computer vision)
030220 oncology & carcinogenesis
Signal Transduction
medicine.drug
Drug
Proto-Oncogene Proteins B-raf
media_common.quotation_subject
Predictive medicine
Antineoplastic Agents
Genomics
Feature selection
Computational biology
Article
03 medical and health sciences
Machine learning
medicine
Humans
Computer Simulation
Sensitivity (control systems)
030304 developmental biology
Mechanism (biology)
business.industry
lcsh:R
Cancer
Dabrafenib
medicine.disease
Precision medicine
Drug Resistance
Neoplasm

Drug Design
Test set
Cancer cell
lcsh:Q
Personalized medicine
Transcriptome
business
Zdroj: Scientific Reports, Vol 10, Iss 1, Pp 1-12 (2020)
Scientific Reports
ISSN: 2045-2322
DOI: 10.1038/s41598-020-65927-9
Popis: Drug sensitivity prediction constitutes one of the main challenges in personalized medicine. The major difficulty of this problem stems from the fact that the sensitivity of cancer cells to treatment depends on an unknown subset of a large number of biological features. Although feature selection is the key to interpretable results and identification of potential biomarkers, a comprehensive assessment of feature selection methods for drug sensitivity prediction has so far not been performed. We propose feature selection approaches driven by prior knowledge of drug targets, target pathways, and gene expression signatures. We asses these methodologies on Genomics of Drug Sensitivity in Cancer (GDSC) dataset, a panel of around 1000 cell lines screened against multiple anticancer compounds. We compare our results with a baseline model utilizing genome-wide gene expression features and common data-driven feature selection techniques. Together, 2484 unique models were evaluated, providing a comprehensive study of feature selection strategies for the drug response prediction problem. For 23 drugs, the models achieve better predictive performance when the features are selected according to prior knowledge of drug targets and pathways. The best correlation of observed and predicted response using the test set is achieved for Linifanib (r=0.75). Extending the drug-dependent features with gene expression signatures yields models that are most predictive of drug response for 60 drugs, with the best performing example of Dabrafenib. Examples of how pre-selection of features benefits the model interpretability are given for Dabrafenib, Linifanib and Quizartinib. Based on GDSC drug data, we find that feature selection driven by prior knowledge tends to yield better results for drugs targeting specific genes and pathways, while models with the genome-wide features perform better for drugs affecting general mechanisms such as metabolism and DNA replication. For a significant group of the compounds, even a very small number of features based on simple drug properties is often highly predictive of drug sensitivity, can explain the mechanism of drug action and be used as guidelines for their prescription. In general, choosing appropriate feature selection strategies has the potential to develop interpretable models that are indicative for therapy design.
Databáze: OpenAIRE
Nepřihlášeným uživatelům se plný text nezobrazuje