Feature selection strategies for drug sensitivity prediction
Autor: | Eike Staub, Krzysztof Koras, Julian Kreis, Dilafruz Juraeva, Ewa Szczurek, Johanna Mazur |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
Support Vector Machine
Computer science Datasets as Topic lcsh:Medicine 0302 clinical medicine Neoplasms Oximes Gene expression Computational models Molecular Targeted Therapy Precision Medicine lcsh:Science media_common Interpretability 0303 health sciences Multidisciplinary Imidazoles Prognosis 3. Good health Feature (computer vision) 030220 oncology & carcinogenesis Signal Transduction medicine.drug Drug Proto-Oncogene Proteins B-raf media_common.quotation_subject Predictive medicine Antineoplastic Agents Genomics Feature selection Computational biology Article 03 medical and health sciences Machine learning medicine Humans Computer Simulation Sensitivity (control systems) 030304 developmental biology Mechanism (biology) business.industry lcsh:R Cancer Dabrafenib medicine.disease Precision medicine Drug Resistance Neoplasm Drug Design Test set Cancer cell lcsh:Q Personalized medicine Transcriptome business |
Zdroj: | Scientific Reports, Vol 10, Iss 1, Pp 1-12 (2020) Scientific Reports |
ISSN: | 2045-2322 |
DOI: | 10.1038/s41598-020-65927-9 |
Popis: | Drug sensitivity prediction constitutes one of the main challenges in personalized medicine. The major difficulty of this problem stems from the fact that the sensitivity of cancer cells to treatment depends on an unknown subset of a large number of biological features. Although feature selection is the key to interpretable results and identification of potential biomarkers, a comprehensive assessment of feature selection methods for drug sensitivity prediction has so far not been performed. We propose feature selection approaches driven by prior knowledge of drug targets, target pathways, and gene expression signatures. We asses these methodologies on Genomics of Drug Sensitivity in Cancer (GDSC) dataset, a panel of around 1000 cell lines screened against multiple anticancer compounds. We compare our results with a baseline model utilizing genome-wide gene expression features and common data-driven feature selection techniques. Together, 2484 unique models were evaluated, providing a comprehensive study of feature selection strategies for the drug response prediction problem. For 23 drugs, the models achieve better predictive performance when the features are selected according to prior knowledge of drug targets and pathways. The best correlation of observed and predicted response using the test set is achieved for Linifanib (r=0.75). Extending the drug-dependent features with gene expression signatures yields models that are most predictive of drug response for 60 drugs, with the best performing example of Dabrafenib. Examples of how pre-selection of features benefits the model interpretability are given for Dabrafenib, Linifanib and Quizartinib. Based on GDSC drug data, we find that feature selection driven by prior knowledge tends to yield better results for drugs targeting specific genes and pathways, while models with the genome-wide features perform better for drugs affecting general mechanisms such as metabolism and DNA replication. For a significant group of the compounds, even a very small number of features based on simple drug properties is often highly predictive of drug sensitivity, can explain the mechanism of drug action and be used as guidelines for their prescription. In general, choosing appropriate feature selection strategies has the potential to develop interpretable models that are indicative for therapy design. |
Databáze: | OpenAIRE |
Externí odkaz: | |
Nepřihlášeným uživatelům se plný text nezobrazuje | K zobrazení výsledku je třeba se přihlásit. |