A systematic evaluation of the benefits and hazards of variable selection in latent variable regression. Part I. Search algorithm, theory and simulations

Autor: Baumann, K., Albert, H., Korff, M. von
Zdroj: Journal of Chemometrics; July 2002, Vol. 16 Issue: 7 p339-350, 12p
Abstrakt: Variable selection is an extensively studied problem in chemometrics and in the area of quantitative structure–activity relationships (QSARs). Many search algorithms have been compared so far. Less well studied is the influence of different objective functions on the prediction quality of the selected models. This paper investigates the performance of different cross-validation techniques as objective function for variable selection in latent variable regression. The results are compared in terms of predictive ability, model size (number of variables) and model complexity (number of latent variables). It will be shown that leave-multiple-out cross-validation with a large percentage of data left out performs best. Since leave-multiple-out cross-validation is computationally expensive, a very efficient tabu search algorithm is introduced to lower the computational burden. The tabu search algorithm needs no user-defined operational parameters and optimizes the variable subset and the number of latent variables simultaneously. Copyright © 2002 John Wiley & Sons, Ltd.
Databáze: Supplemental Index