Variable selection in multivariate linear models with high-dimensional covariance matrix estimation
Autor: | Céline Lévy-Leduc, Marie Perrot-Dockès, Julien Chiquet, Laure Sansonnet |
---|---|
Přispěvatelé: | Mathématiques et Informatique Appliquées (MIA-Paris), AgroParisTech-Institut National de la Recherche Agronomique (INRA) |
Předmět: |
0301 basic medicine
Statistics and Probability Variable selection Feature selection Mathematics - Statistics Theory Statistics Theory (math.ST) 01 natural sciences 010104 statistics & probability 03 medical and health sciences Lasso (statistics) FOS: Mathematics Applied mathematics [INFO]Computer Science [cs] 0101 mathematics [MATH]Mathematics [math] Coefficient matrix Multivariate linear model Mathematics Numerical Analysis bepress|Physical Sciences and Mathematics|Mathematics Covariance matrix Null (mathematics) Linear model Estimator High-dimensional covariance matrix estimation Toeplitz matrix 030104 developmental biology Statistics Probability and Uncertainty Lasso |
Zdroj: | Journal of Multivariate Analysis Journal of Multivariate Analysis, Elsevier, 2018, 166, pp.78-97. ⟨10.1016/j.jmva.2018.02.006⟩ |
ISSN: | 0047-259X 1095-7243 |
Popis: | In this paper, we propose a novel variable selection approach in the framework of multivariate linear models taking into account the dependence that may exist between the responses. It consists in estimating beforehand the covariance matrix Σ of the responses and to plug this estimator in a Lasso criterion, in order to obtain a sparse estimator of the coefficient matrix. The properties of our approach are investigated both from a theoretical and a numerical point of view. More precisely, we give general conditions that the estimators of the covariance matrix and its inverse have to satisfy in order to recover the positions of the null and non null entries of the coefficient matrix when the size of Σ is not fixed and can tend to infinity. We prove that these conditions are satisfied in the particular case of some Toeplitz matrices. Our approach is implemented in the R package MultiVarSel available from the Comprehensive R Archive Network (CRAN) and is very attractive since it benefits from a low computational load. We also assess the performance of our methodology using synthetic data and compare it with alternative approaches. Our numerical experiments show that including the estimation of the covariance matrix in the Lasso criterion dramatically improves the variable selection performance in many cases. |
Databáze: | OpenAIRE |
Externí odkaz: |