Risk analysis: Survival data analysis vs. machine learning. Application to Alzheimer prediction

Autor: Catherine Huber-Carol, Shulamith T. Gross, Filia Vonta
Přispěvatelé: Mathématiques Appliquées Paris 5 (MAP5 - UMR 8145), Université Paris Descartes - Paris 5 (UPD5)-Institut National des Sciences Mathématiques et de leurs Interactions (INSMI)-Centre National de la Recherche Scientifique (CNRS), Baruch College [CUNY], City University of New York [New York] (CUNY), National Technical University of Athens [Athens] (NTUA)
Rok vydání: 2019
Předmět:
Zdroj: Comptes Rendus Mécanique
Comptes Rendus Mécanique, Elsevier, 2019
Comptes rendus-Mécanique
Comptes rendus-Mécanique, 2019
ISSN: 1631-0721
DOI: 10.1016/j.crme.2019.11.007
Popis: We present here the statistical models that are most in use in survival data analysis. The parametric ones are based on explicit distributions, depending only on real unknown parameters, while the preferred models are semi-parametric, like Cox model, which imply unknown functions to be estimated. Now, as big data sets are available, two types of methods are needed to deal with the resulting curse of dimensionality including non informative factors which spoil the informative part relative to the target: on one hand, methods that reduce the dimension while maximizing the information left in the reduced data, and then applying classical stochastic models; on the other hand algorithms that apply directly to big data, i.e. artificial intelligence (AI or machine learning). Actually, those algorithms have a probabilistic interpretation. We present here several of the former methods. As for the latter methods, which comprise neural networks, support vector machines, random forests and more (see second edition, January 2017 of Hastie, Tibshirani et al. (2005) [1] ), we present the neural networks approach. Neural networks are known to be efficient for prediction on big data. As we analyzed, using a classical stochastic model, risk factors for Alzheimer on a data set of around 5000 patients and p = 17 factors, we were interested in comparing its prediction performance with the one of a neural network on this relatively small sample size data.
Databáze: OpenAIRE