A Bootstrapping Soft Shrinkage Approach and Interval Random Variables Selection Hybrid Model for Variable Selection in Near-Infrared Spectroscopy
Autor: | Kim Seng Chia, Wan Saiful-Islam Wan Salam, Ammar Abdo Mohammed Haidar Mahdi, Hitham Alhussian, Abdul-Malik H. Y. Saad, N. A. M. Alduais, Hasan Ali Gamal Al-kaf, Abdulqader M. Mohsen |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
General Computer Science
near infrared spectroscopy General Engineering Feature selection Interval (mathematics) weighted bootstrap sampling Variable (computer science) Hybrid variable selection Boss Statistics partial least squares model population analysis General Materials Science lcsh:Electrical engineering. Electronics. Nuclear engineering Random variable lcsh:TK1-9971 Selection (genetic algorithm) Bootstrapping (statistics) Mathematics Shrinkage |
Zdroj: | IEEE Access, Vol 8, Pp 168036-168052 (2020) |
ISSN: | 2169-3536 |
Popis: | High dimensionality problem in spectra datasets is a significant challenge to researchers and requires the design of effective methods that can extract the optimal variable subset that can improve the accuracy of predictions or classifications. In this study, a hybrid variable selection method, based on the incremental number of variables using bootstrapping soft shrinkage method (BOSS) and interval random variable selection (IRVS) method is proposed and named BOSS-IRVS. The BOSS method is used to determine the informative intervals, while the IRVS method is used to search for informative variables in the informative interval determined by BOSS method. The proposed BOSS-IRVS method was tested using seven different public accessible near-infrared (NIR) spectroscopic datasets of corn, diesel fuel, soy, wheat protein, and hemoglobin types. The performance of the proposed method was compared with that of two outstanding variable selection methods i.e. BOSS and hybrid variable selection strategy based on continuous shrinkage of variable space (VCPA-IRIV). The experimental results showed clearly that the proposed method BOSS-IRVS outperforms VCPA-IRIV and BOSS methods in all tested datasets and improved the percentage of the prediction accuracy, by 15.4 and 15.3 for corn moisture,13.4 and 49.8 for corn oil, 41.5 and 50.6 for corn protein, 12.6 and 5.6 for soy moisture, 0.6 and 6.3 for total diesel fuel, 19.9 and 14.3 for wheat protein, and 5.8 and 20.3 for hemoglobin. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ |
Databáze: | OpenAIRE |
Externí odkaz: |