Machine learning versus logistic regression methods for 2-year mortality prognostication in a small, heterogeneous glioma database

Autor:	Sandip S. Panesar, Juan C. Fernandez-Miranda, Fang-Cheng Yeh, Rhett N. D'Souza
Jazyk:	angličtina
Rok vydání:	2018
Předmět:	ANN Artificial neural network Computer science lcsh:Surgery Decision tree Logistic regression Feature selection Prognostication SVM Support vector machine NLR Negative likelihood ratio Machine learning computer.software_genre WHO World Health Organization lcsh:RC346-429 Glioma Diagnosis Neuro-oncology PLR Positive likelihood ratio medicine Feature (machine learning) Gliomas NPV Negative predictive value PPV Positive predictive value lcsh:Neurology. Diseases of the nervous system Artificial neural network Database CI Confidence interval business.industry Cancer lcsh:RD1-811 AUC Area under the curve medicine.disease Support vector machine Categorization LR Logistic regression Surgery Original Article Neurology (clinical) Artificial intelligence business computer ML Machine learning DT Decision tree
Zdroj:	World Neurosurgery: X, Vol 2, Iss, Pp-(2019) World Neurosurgery: X
DOI:	10.1101/472555
Popis:	BackgroundMachine learning (ML) is the application of specialized algorithms to datasets for trend delineation, categorization or prediction. ML techniques have been traditionally applied to large, highly-dimensional databases. Gliomas are a heterogeneous group of primary brain tumors, traditionally graded using histopathological features. Recently the World Health Organization proposed a novel grading system for gliomas incorporating molecular characteristics. We aimed to study whether ML could achieve accurate prognostication of 2-year mortality in a small, highly-dimensional database of glioma patients.MethodsWe applied three machine learning techniques: artificial neural networks (ANN), decision trees (DT), support vector machine (SVM), and classical logistic regression (LR) to a dataset consisting of 76 glioma patients of all grades. We compared the effect of applying the algorithms to the raw database, versus a database where only statistically significant features were included into the algorithmic inputs (feature selection).ResultsRaw input consisted of 21 variables, and achieved performance of (accuracy/AUC): 70.7%/0.70 for ANN, 68%/0.72 for SVM, 66.7%/0.64 for LR and 65%/0.70 for DT. Feature selected input consisted of 14 variables and achieved performance of 73.4%/0.75 for ANN, 73.3%/0.74 for SVM, 69.3%/0.73 for LR and 65.2%/0.63 for DT.ConclusionsWe demonstrate that these techniques can also be applied to small, yet highly-dimensional datasets. Our ML techniques achieved reasonable performance compared to similar studies in the literature. Though local databases may be small versus larger cancer repositories, we demonstrate that ML techniques can still be applied to their analysis, though traditional statistical methods are of similar benefit.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::0d609a928fbad3a13e17bf829b4d73e6 Zobrazit plný text záznamu