Model selection by resampling penalization

Autor: Sylvain Arlot
Přispěvatelé: Laboratoire d'informatique de l'école normale supérieure (LIENS), Département d'informatique - ENS Paris (DI-ENS), Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL), Models of visual object recognition and scene understanding (WILLOW), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Inria Paris-Rocquencourt, Institut National de Recherche en Informatique et en Automatique (Inria), École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS), Département d'informatique de l'École normale supérieure (DI-ENS), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Paris-Rocquencourt, École normale supérieure - Paris (ENS-PSL), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS-PSL), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), Laboratoire de Mathématiques d'Orsay (LM-Orsay), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)
Jazyk: angličtina
Rok vydání: 2007
Předmět:
Statistics and Probability
Mathematical optimization
Heteroscedasticity
model selection
Statistics::Theory
penalization
histogram selection
Mathematics - Statistics Theory
02 engineering and technology
Statistics Theory (math.ST)
oracle inequality
01 natural sciences
adaptivity
V-fold cross-validation
010104 statistics & probability
non-parametric statistics
resampling
[MATH.MATH-ST]Mathematics [math]/Statistics [math.ST]
62G08
Resampling
62G09
62M20
0202 electrical engineering
electronic engineering
information engineering

FOS: Mathematics
Statistics::Methodology
0101 mathematics
non-parametric regression
Mathematics
Smoothness (probability theory)
Statistics::Applications
Model selection
Mathematical statistics
Nonparametric statistics
Estimator
020206 networking & telecommunications
[STAT.TH]Statistics [stat]/Statistics Theory [stat.TH]
AMS 62G08
Nonparametric regression
Statistics::Computation
regressogram
AMS 62G09
regression
Statistics
Probability and Uncertainty

heteroscedastic data
exchangeable weighted bootstrap
Zdroj: Electronic Journal of Statistics
Electronic Journal of Statistics, Shaker Heights, OH : Institute of Mathematical Statistics, 2009, 3, pp.557--624. ⟨10.1214/08-EJS196⟩
Electron. J. Statist. 3 (2009), 557-624
Electronic Journal of Statistics, 2009, 3, pp.557--624. ⟨10.1214/08-EJS196⟩
ISSN: 1935-7524
DOI: 10.1214/08-EJS196⟩
Popis: In this paper, a new family of resampling-based penalization procedures for model selection is defined in a general framework. It generalizes several methods, including Efron's bootstrap penalization and the leave-one-out penalization recently proposed by Arlot (2008), to any exchangeable weighted bootstrap resampling scheme. In the heteroscedastic regression framework, assuming the models to have a particular structure, these resampling penalties are proved to satisfy a non-asymptotic oracle inequality with leading constant close to 1. In particular, they are asympotically optimal. Resampling penalties are used for defining an estimator adapting simultaneously to the smoothness of the regression function and to the heteroscedasticity of the noise. This is remarkable because resampling penalties are general-purpose devices, which have not been built specifically to handle heteroscedastic data. Hence, resampling penalties naturally adapt to heteroscedasticity. A simulation study shows that resampling penalties improve on V-fold cross-validation in terms of final prediction error, in particular when the signal-to-noise ratio is not large.
Comment: extended version of hal-00125455, with a technical appendix
Databáze: OpenAIRE