Machine learning versus logistic regression methods for 2-year mortality prognostication in a small, heterogeneous glioma database

Autor: Sandip S. Panesar, Juan C. Fernandez-Miranda, Fang-Cheng Yeh, Rhett N. D'Souza
Jazyk: angličtina
Rok vydání: 2018
Předmět:
ANN
Artificial neural network

Computer science
lcsh:Surgery
Decision tree
Logistic regression
Feature selection
Prognostication
SVM
Support vector machine

NLR
Negative likelihood ratio

Machine learning
computer.software_genre
WHO
World Health Organization

lcsh:RC346-429
Glioma
Diagnosis
Neuro-oncology
PLR
Positive likelihood ratio

medicine
Feature (machine learning)
Gliomas
NPV
Negative predictive value

PPV
Positive predictive value

lcsh:Neurology. Diseases of the nervous system
Artificial neural network
Database
CI
Confidence interval

business.industry
Cancer
lcsh:RD1-811
AUC
Area under the curve

medicine.disease
Support vector machine
Categorization
LR
Logistic regression

Surgery
Original Article
Neurology (clinical)
Artificial intelligence
business
computer
ML
Machine learning

DT
Decision tree
Zdroj: World Neurosurgery: X, Vol 2, Iss, Pp-(2019)
World Neurosurgery: X
DOI: 10.1101/472555
Popis: BackgroundMachine learning (ML) is the application of specialized algorithms to datasets for trend delineation, categorization or prediction. ML techniques have been traditionally applied to large, highly-dimensional databases. Gliomas are a heterogeneous group of primary brain tumors, traditionally graded using histopathological features. Recently the World Health Organization proposed a novel grading system for gliomas incorporating molecular characteristics. We aimed to study whether ML could achieve accurate prognostication of 2-year mortality in a small, highly-dimensional database of glioma patients.MethodsWe applied three machine learning techniques: artificial neural networks (ANN), decision trees (DT), support vector machine (SVM), and classical logistic regression (LR) to a dataset consisting of 76 glioma patients of all grades. We compared the effect of applying the algorithms to the raw database, versus a database where only statistically significant features were included into the algorithmic inputs (feature selection).ResultsRaw input consisted of 21 variables, and achieved performance of (accuracy/AUC): 70.7%/0.70 for ANN, 68%/0.72 for SVM, 66.7%/0.64 for LR and 65%/0.70 for DT. Feature selected input consisted of 14 variables and achieved performance of 73.4%/0.75 for ANN, 73.3%/0.74 for SVM, 69.3%/0.73 for LR and 65.2%/0.63 for DT.ConclusionsWe demonstrate that these techniques can also be applied to small, yet highly-dimensional datasets. Our ML techniques achieved reasonable performance compared to similar studies in the literature. Though local databases may be small versus larger cancer repositories, we demonstrate that ML techniques can still be applied to their analysis, though traditional statistical methods are of similar benefit.
Databáze: OpenAIRE