Multi-Objective Evolutionary Simultaneous Feature Selection and Outlier Detection for Regression
Autor: | Estrella Lucena-Sánchez, Fernando Jiménez, Gracia Sánchez, Guido Sciavicco |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2021 |
Předmět: |
Multivariate statistics
General Computer Science Computer science General Engineering Evolutionary algorithm Feature selection Regression analysis computer.software_genre underground water contamination Evolutionary computation Regression TK1-9971 feature selection multi-objective optimization evolutionary computation Outlier Outlier detection General Materials Science Anomaly detection Data mining Electrical engineering. Electronics. Nuclear engineering computer |
Zdroj: | IEEE Access, Vol 9, Pp 135675-135688 (2021) |
ISSN: | 2169-3536 |
Popis: | When investigating the causes of contamination in specific contexts, such as in underground water wells, multivariate regression is commonly used to establish possible links between the chemical-physical values of the samples and the levels of contaminant. Two issues often arise from such a statistical analysis: selecting the best predicting variables and detecting the instances that can be suspected to be outliers. In this paper, we propose a comprehensive, integrated, and general optimization model that solves these two problems simultaneously in such a way that outliers can be detected in reference to the specific variables that are selected for the regression, and we implement such an optimization model with a well-known evolutionary algorithm. We test our proposal on data extracted from a project whose aim is to establish the causes of the contamination of underwater water wells in a very specific area of northeastern Italy. The results show that our variable selection and outlier detection algorithm allows the synthesis of very reliable, interpretable, and clean regression models. |
Databáze: | OpenAIRE |
Externí odkaz: |