Predicting Complete Remission of Acute Myeloid Leukemia: Machine Learning Applied to Gene Expression
Autor: | Daoud Meerzaman, Yu Fan, Ophir Gal, Noam Auslander |
---|---|
Rok vydání: | 2019 |
Předmět: |
remission induction
0301 basic medicine Cancer Research Short Report Machine learning computer.software_genre lcsh:RC254-282 03 medical and health sciences Remission induction 0302 clinical medicine Biomedical data Gene expression gene expression profiling Medicine acute Myeloid Leukemia (AML) machine Learning (ML) business.industry Complete remission Myeloid leukemia lcsh:Neoplasms. Tumors. Oncology. Including cancer and carcinogens Gene expression profiling 030104 developmental biology Oncology 030220 oncology & carcinogenesis Artificial intelligence business computer |
Zdroj: | Cancer Informatics Cancer Informatics, Vol 18 (2019) |
ISSN: | 1176-9351 |
DOI: | 10.1177/1176935119835544 |
Popis: | Machine learning (ML) is a useful tool for advancing our understanding of the patterns and significance of biomedical data. Given the growing trend on the application of ML techniques in precision medicine, here we present an ML technique which predicts the likelihood of complete remission (CR) in patients diagnosed with acute myeloid leukemia (AML). In this study, we explored the question of whether ML algorithms designed to analyze gene-expression patterns obtained through RNA sequencing (RNA-seq) can be used to accurately predict the likelihood of CR in pediatric AML patients who have received induction therapy. We employed tests of statistical significance to determine which genes were differentially expressed in the samples derived from patients who achieved CR after 2 courses of treatment and the samples taken from patients who did not benefit. We tuned classifier hyperparameters to optimize performance and used multiple methods to guide our feature selection as well as our assessment of algorithm performance. To identify the model which performed best within the context of this study, we plotted receiver operating characteristic (ROC) curves. Using the top 75 genes from the k-nearest neighbors algorithm (K-NN) model ( K = 27) yielded the best area-under-the-curve (AUC) score that we obtained: 0.84. When we finally tested the previously unseen test data set, the top 50 genes yielded the best AUC = 0.81. Pathway enrichment analysis for these 50 genes showed that the guanosine diphosphate fucose (GDP-fucose) biosynthesis pathway is the most significant with an adjusted P value = .0092, which may suggest the vital role of N-glycosylation in AML. |
Databáze: | OpenAIRE |
Externí odkaz: | |
Nepřihlášeným uživatelům se plný text nezobrazuje | K zobrazení výsledku je třeba se přihlásit. |