Exploring variable selection in additive mixed effects models using group lasso
Autor: | Avalos, Marta, Soret, Perrine, Meza, Cristian, Bertin, Karine, Ren, Hao, Hellard, Philippe |
---|---|
Přispěvatelé: | Université de Bordeaux (UB), Statistics In System biology and Translational Medicine (SISTM), Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)- Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Vaccine Research Institute (VRI), Université Paris-Est Créteil Val-de-Marne - Paris 12 (UPEC UP12), Centro de Investigación y Modelamiento de Fenómenos Aleatorios – Valparaíso (CIMFAV), Universidad de Valparaiso [Chile], Université de Technologie de Compiègne (UTC), Fédération Française de Natation (FFN), Institut de recherche biomédicale et d’épidémiologie du sport (IRMES - EA 7329), Université Paris Descartes - Paris 5 (UPD5)-Institut national du sport, de l'expertise et de la performance (INSEP), This research was partially funded by the French Institute of Sport, Expertise and Performance (INSEP) under grant no14r21, Statistical Society of Australia (SSA), Avalos, Marta, Université de Bordeaux ( UB ), Statistics In System biology and Translational Medicine ( SISTM ), Epidémiologie et Biostatistique [Bordeaux], Université Bordeaux Segalen - Bordeaux 2-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale ( INSERM ) -Université Bordeaux Segalen - Bordeaux 2-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale ( INSERM ) -Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique ( Inria ) -Institut National de Recherche en Informatique et en Automatique ( Inria ), Centro de Investigación y Modelamiento de Fenómenos Aleatorios – Valparaíso ( CIMFAV ), Université de Technologie de Compiègne ( UTC ), Fédération Française de Natation ( FFN ), Institut de recherche biomédicale et d’épidémiologie du sport ( IRMES - EA 7329 ), Institut national du sport, de l'expertise et de la performance ( INSEP ) -Université Paris Descartes - Paris 5 ( UPD5 ), Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Epidémiologie et Biostatistique [Bordeaux], Université Bordeaux Segalen - Bordeaux 2-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université Bordeaux Segalen - Bordeaux 2-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM) |
Jazyk: | angličtina |
Rok vydání: | 2016 |
Předmět: |
[STAT.AP]Statistics [stat]/Applications [stat.AP]
[STAT.ME] Statistics [stat]/Methodology [stat.ME] Longitudinal data --- algorithme EM [ STAT.AP ] Statistics [stat]/Applications [stat.AP] [ SDV.SPEE ] Life Sciences [q-bio]/Santé publique et épidémiologie [INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG] L1-penalty [STAT.CO] Statistics [stat]/Computation [stat.CO] [ INFO.INFO-LG ] Computer Science [cs]/Machine Learning [cs.LG] [STAT.ML] Statistics [stat]/Machine Learning [stat.ML] [ STAT.ME ] Statistics [stat]/Methodology [stat.ME] [STAT.AP] Statistics [stat]/Applications [stat.AP] [STAT.ML]Statistics [stat]/Machine Learning [stat.ML] [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] [SDV.SPEE] Life Sciences [q-bio]/Santé publique et épidémiologie [SDV.SPEE]Life Sciences [q-bio]/Santé publique et épidémiologie Sport science data [STAT.CO]Statistics [stat]/Computation [stat.CO] [STAT.ME]Statistics [stat]/Methodology [stat.ME] [ STAT.ML ] Statistics [stat]/Machine Learning [stat.ML] [ STAT.CO ] Statistics [stat]/Computation [stat.CO] |
Zdroj: | 23rd Australian Statistical Conference 23rd Australian Statistical Conference, Statistical Society of Australia (SSA), Dec 2016, Canberra, Australia 23rd Australian Statistical Conference, Dec 2016, Canberra, Australia. 〈http://asc2016.com.au/〉 |
Popis: | International audience; We consider the problem of estimating a high-dimensional additive mixed model for longitudinal data using sparse methods. In this problem, multiple measurements are made on the same subject across time, and then the different sources of variability (intra- and inter-subject variability) and correlation within subjects have to be considered. Also, the relationships between explanatory variables and the outcome arepossibly non linear. In addition, the number of explanatory variables could be larger than the sample size but only a small set of explanatory variables contribute to the response.Several computational approaches for high-dimensional additive modelling for independent data have been developed in the literature. Recently, Amato and colleagues (Stat Methods Appl 2016; s10260-016-0357-8) conducted a comprehensive review of these methods. Efficient regularized estimation procedures for variable selection in nonparametric additive models use basis function approximations. The authors also proposed a reformulation of the estimation problem in terms of group Lasso that allows deducing convergence and asymptotic optimality properties.Only a few works have developed suggestions to analyse high-dimensional longitudinal data using Lasso-type methods in additive mixed model. The resulting estimator depends only on a relatively small number of basis functions, however variable selection is not directly encouraged. In this study we explore the extension of the group Lasso penalty to additive mixed effects models. We discuss computational aspects, including a comparison of group Lasso algorithms implemented through publicly available R codes, the estimation of optimal regularization parameter and linkages between mean and covariance parameter estimation algorithms. We illustrate the interest of such approaches in the analysis of a twenty - year longitudinal study of training practices of elite athletes. |
Databáze: | OpenAIRE |
Externí odkaz: |