An improved successive projections algorithm version to variable selection in multiple linear regression.

Autor: Canova LDS; Instituto de Química, IQ, Universidade Federal do Rio Grande do Sul, Av. Bento Gonçalves, 9500 Agronomia, 91501970, Porto Alegre, RS, Brazil., Vallese FD; Dpto. de Química, Universidad Nacional del Sur, INQUISUR, Av. Alem 1253, B8000CPB, Bahía Blanca, Buenos Aires, Argentina., Pistonesi MF; Dpto. de Química, Universidad Nacional del Sur, INQUISUR, Av. Alem 1253, B8000CPB, Bahía Blanca, Buenos Aires, Argentina., de Araújo Gomes A; Instituto de Química, IQ, Universidade Federal do Rio Grande do Sul, Av. Bento Gonçalves, 9500 Agronomia, 91501970, Porto Alegre, RS, Brazil. Electronic address: araujo.gomes@ufrgs.br.
Jazyk: angličtina
Zdroj: Analytica chimica acta [Anal Chim Acta] 2023 Sep 15; Vol. 1274, pp. 341560. Date of Electronic Publication: 2023 Jun 26.
DOI: 10.1016/j.aca.2023.341560
Abstrakt: The aim of the successive projections algorithm (SPA) is to enhance the accuracy of multiple linear regressions (MLR) by minimizing the impact of collinearity effects in the calibration data set. Combining SPA with MLR as a variable selection approach has resulted in the SPA-MLR method, which has been reported in literature to produce models with good prediction ability compared to conventional full-spectrum models obtained with partial-least-squares (PLS) in some cases. This paper proposes the addition of a filter step to the current version of the SPA algorithm to reduce the number of uninformative variables before the projection phase and assist the algorithm in selecting the best variables on subsequent steps. The proposed fSPA-MLR algorithm is evaluated in two case studies involving the near-infrared spectrometric analysis of pharmaceutical tablet and diesel/biodiesel mixture samples. Compared to PLS, the fSPA-MLR models demonstrate similar or better performance. Moreover, the fSPA-MLR models outperform the original SPA-MLR in both cross-validation and external prediction. The fSPA-MLR models deliver superior results regardless of the pre-processing algorithm tested, including first-derivative Savitzky-Golay (SG) and Standard Normal Variate (SNV), or even in raw spectra data.
Competing Interests: Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
(Copyright © 2023 Elsevier B.V. All rights reserved.)
Databáze: MEDLINE