Risk analysis: Survival data analysis vs. machine learning. Application to Alzheimer prediction
Autor: | Catherine Huber-Carol, Shulamith T. Gross, Filia Vonta |
---|---|
Přispěvatelé: | Mathématiques Appliquées Paris 5 (MAP5 - UMR 8145), Université Paris Descartes - Paris 5 (UPD5)-Institut National des Sciences Mathématiques et de leurs Interactions (INSMI)-Centre National de la Recherche Scientifique (CNRS), Baruch College [CUNY], City University of New York [New York] (CUNY), National Technical University of Athens [Athens] (NTUA) |
Rok vydání: | 2019 |
Předmět: |
Marketing
Artificial neural network business.industry Computer science Strategy and Management Big data Statistical model 02 engineering and technology Machine learning computer.software_genre Random forest Support vector machine Data set 020303 mechanical engineering & transports 0203 mechanical engineering [MATH.MATH-ST]Mathematics [math]/Statistics [math.ST] Media Technology General Materials Science Artificial intelligence business computer ComputingMilieux_MISCELLANEOUS Parametric statistics Curse of dimensionality |
Zdroj: | Comptes Rendus Mécanique Comptes Rendus Mécanique, Elsevier, 2019 Comptes rendus-Mécanique Comptes rendus-Mécanique, 2019 |
ISSN: | 1631-0721 |
DOI: | 10.1016/j.crme.2019.11.007 |
Popis: | We present here the statistical models that are most in use in survival data analysis. The parametric ones are based on explicit distributions, depending only on real unknown parameters, while the preferred models are semi-parametric, like Cox model, which imply unknown functions to be estimated. Now, as big data sets are available, two types of methods are needed to deal with the resulting curse of dimensionality including non informative factors which spoil the informative part relative to the target: on one hand, methods that reduce the dimension while maximizing the information left in the reduced data, and then applying classical stochastic models; on the other hand algorithms that apply directly to big data, i.e. artificial intelligence (AI or machine learning). Actually, those algorithms have a probabilistic interpretation. We present here several of the former methods. As for the latter methods, which comprise neural networks, support vector machines, random forests and more (see second edition, January 2017 of Hastie, Tibshirani et al. (2005) [1] ), we present the neural networks approach. Neural networks are known to be efficient for prediction on big data. As we analyzed, using a classical stochastic model, risk factors for Alzheimer on a data set of around 5000 patients and p = 17 factors, we were interested in comparing its prediction performance with the one of a neural network on this relatively small sample size data. |
Databáze: | OpenAIRE |
Externí odkaz: |