EvoImp: Multiple Imputation of Multi-label Classification data with a genetic algorithm.
Autor: | Jacob Junior AFL; Graduate Program in Electrical Engineering (PPGEE), Federal University of Maranhão (UFMA), São Luís, Maranhão, Brazil.; Graduate Program in Computer Engineering and Systems (PECS), State University of Maranhão (UEMA), São Luís, Maranhão, Brazil., do Carmo FA; Graduate Program in Computer Engineering and Systems (PECS), State University of Maranhão (UEMA), São Luís, Maranhão, Brazil., de Santana AL; Corporate ReD Headquarters Fuji Electric Co., Tokyo, Japan., Santana EEC; Graduate Program in Electrical Engineering (PPGEE), Federal University of Maranhão (UFMA), São Luís, Maranhão, Brazil.; Graduate Program in Computer Engineering and Systems (PECS), State University of Maranhão (UEMA), São Luís, Maranhão, Brazil., Lobato FMF; Graduate Program in Computer Engineering and Systems (PECS), State University of Maranhão (UEMA), São Luís, Maranhão, Brazil.; Institute of Engineering and Geosciences, Federal University of Western Pará (UFOPA), Santarém, Pará, Brazil. |
---|---|
Jazyk: | angličtina |
Zdroj: | PloS one [PLoS One] 2024 Jan 19; Vol. 19 (1), pp. e0297147. Date of Electronic Publication: 2024 Jan 19 (Print Publication: 2024). |
DOI: | 10.1371/journal.pone.0297147 |
Abstrakt: | Missing data is a prevalent problem that requires attention, as most data analysis techniques are unable to handle it. This is particularly critical in Multi-Label Classification (MLC), where only a few studies have investigated missing data in this application domain. MLC differs from Single-Label Classification (SLC) by allowing an instance to be associated with multiple classes. Movie classification is a didactic example since it can be "drama" and "bibliography" simultaneously. One of the most usual missing data treatment methods is data imputation, which seeks plausible values to fill in the missing ones. In this scenario, we propose a novel imputation method based on a multi-objective genetic algorithm for optimizing multiple data imputations called Multiple Imputation of Multi-label Classification data with a genetic algorithm, or simply EvoImp. We applied the proposed method in multi-label learning and evaluated its performance using six synthetic databases, considering various missing values distribution scenarios. The method was compared with other state-of-the-art imputation strategies, such as K-Means Imputation (KMI) and weighted K-Nearest Neighbors Imputation (WKNNI). The results proved that the proposed method outperformed the baseline in all the scenarios by achieving the best evaluation measures considering the Exact Match, Accuracy, and Hamming Loss. The superior results were constant in different dataset domains and sizes, demonstrating the EvoImp robustness. Thus, EvoImp represents a feasible solution to missing data treatment for multi-label learning. Competing Interests: The authors have declared that no competing interests exist. (Copyright: © 2024 Jacob Junior et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.) |
Databáze: | MEDLINE |
Externí odkaz: | |
Nepřihlášeným uživatelům se plný text nezobrazuje | K zobrazení výsledku je třeba se přihlásit. |