Pseudo-sample trajectories for variable interaction detection in Dissimilarity Partial Least Squares

Autor: Geert Postma, Lutgarde M. C. Buydens, Jasper Engel, I. van Peufflik, Lionel Blanchet
Rok vydání: 2015
Předmět:
Zdroj: Chemometrics and Intelligent Laboratory Systems, 146, pp. 89-101
Chemometrics and Intelligent Laboratory Systems, 146, 89-101
ISSN: 0169-7439
Popis: In linear regression and classification variable interactions are neglected, unless explicitly modelled. In practice, however, explicit modelling of interactions is often not feasible. For example, in high dimensional data it is often not known which variables interact and the number of variables is too large to investigate all possible interactions. Ignoring interactions can detrimentally influence the performance of the model. Additionally, important variables related to the study might not be identified. To remedy these issues, kernel based regression and classification techniques are often used instead. These methods are capable of automatically including variable interactions. In this paper we focus on Dissimilarity Partial Least Squares, which is a kernel-based method that does not require kernel optimisation. The main disadvantage of kernel based methods is that the interpretation is notoriously difficult because they are essentially black-box techniques. Recently, the so-called pseudo-sample approach was proposed to retrieve and visualise the contribution of variables. The method has been successfully used in a number of applications ranging from industrial process control to metabolomics. One shortcoming up to now, which may be crucial for model interpretation, is that it cannot be used to visualise variable interactions. In this work, we propose a simple extension of the pseudo-sample approach to be able to visualise variable contributions related to interaction. An associated quantitative measure for the detection of interacting variables is introduced as well. The proposed methodology is extensively tested on real data as well as simulated data sets involving various contributions of the main effects, interaction effects and noise.
Databáze: OpenAIRE