Predicting Breast Cancer Patient Survival Using Machine Learning

Autor:	David Solti, Haijun Zhai
Rok vydání:	2013
Předmět:	Computer science business.industry Feature vector Decision tree Cancer medicine.disease Machine learning computer.software_genre Logistic regression Cross-validation Naive Bayes classifier Breast cancer medicine Surveillance Epidemiology and End Results Artificial intelligence business computer
Zdroj:	BCB
DOI:	10.1145/2506583.2512376
Popis:	Our null hypothesis was that a computer algorithm will not predict breast cancer patients' 10-year survival with greater accuracy than the 64.3% baseline of the Surveillance Epidemiology and End Results (SEER) database [3]. The aims of this study were to (1) Build an infrastructure to convert SEER data into a machine readable format; (2) Train Machine Learning (ML) algorithms to predict breast cancer patients' 10-year survival; and (3) Measure the predictive accuracy of the ML algorithms. We downloaded 657,711 breast cancer patients' clinical and demographic characteristics from the SEER database and converted them into machine-readable feature vectors. An oncologist generated a list of potential variables for the ML algorithms. We trained the WEKA Machine Learning package's Logistic Regression (LR), Naive Bayes, and C4.5 Decision Tree algorithms on the data using ten-fold cross validation. LR, Naive Bayes, and C4.5 Decision Tree achieved accuracies of 76.29%, 59.71%, and 77.43% respectively. We compared the results of the LR algorithm with those of a well-known website, Adjuvant! Online. The results rejected the null hypothesis for LR and C4.5 Decision Tree, but failed to reject for Naive Bayes. Of the algorithms tested, C4.5 proved to be the most accurate predictor of patient survival in ten years. In addition, LR provided more accurate predictions than Adjuvant! without Adjuvant!'s limitations.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::61b573dd2bc7d434f78819eba60809c5 https://doi.org/10.1145/2506583.2512376 Zobrazit plný text záznamu