Autor: |
Fernando Jimenez, Estrella Lucena-Sanchez, Gracia Sanchez, Guido Sciavicco |
Jazyk: |
angličtina |
Rok vydání: |
2021 |
Předmět: |
|
Zdroj: |
IEEE Access, Vol 9, Pp 135675-135688 (2021) |
Druh dokumentu: |
article |
ISSN: |
2169-3536 |
DOI: |
10.1109/ACCESS.2021.3115848 |
Popis: |
When investigating the causes of contamination in specific contexts, such as in underground water wells, multivariate regression is commonly used to establish possible links between the chemical-physical values of the samples and the levels of contaminant. Two issues often arise from such a statistical analysis: selecting the best predicting variables and detecting the instances that can be suspected to be outliers. In this paper, we propose a comprehensive, integrated, and general optimization model that solves these two problems simultaneously in such a way that outliers can be detected in reference to the specific variables that are selected for the regression, and we implement such an optimization model with a well-known evolutionary algorithm. We test our proposal on data extracted from a project whose aim is to establish the causes of the contamination of underwater water wells in a very specific area of northeastern Italy. The results show that our variable selection and outlier detection algorithm allows the synthesis of very reliable, interpretable, and clean regression models. |
Databáze: |
Directory of Open Access Journals |
Externí odkaz: |
|