Classification performance assessment for imbalanced multiclass data.
Autor: | Aguilar-Ruiz JS; School of Engineering, Pablo de Olavide University, 41013, Seville, Spain. aguilar@upo.es., Michalak M; Department of Computer Networks and Systems, Silesian University of Technology, ul. Akademicka 16, 44-100, Gliwice, Poland. |
---|---|
Jazyk: | angličtina |
Zdroj: | Scientific reports [Sci Rep] 2024 May 10; Vol. 14 (1), pp. 10759. Date of Electronic Publication: 2024 May 10. |
DOI: | 10.1038/s41598-024-61365-z |
Abstrakt: | The evaluation of diagnostic systems is pivotal for ensuring the deployment of high-quality solutions, especially given the pronounced context-sensitivity of certain systems, particularly in fields such as biomedicine. Of notable importance are predictive models where the target variable can encompass multiple values (multiclass), especially when these classes exhibit substantial frequency disparities (imbalance). In this study, we introduce the Imbalanced Multiclass Classification Performance (IMCP) curve, specifically designed for multiclass datasets (unlike the ROC curve), and characterized by its resilience to class distribution variations (in contrast to accuracy or F β -score). Moreover, the IMCP curve facilitates individual performance assessment for each class within the diagnostic system, shedding light on the confidence associated with each prediction-an aspect of particular significance in medical diagnosis. Empirical experiments conducted with real-world data in a multiclass context (involving 35 types of tumors) featuring a high level of imbalance demonstrate that both the IMCP curve and the area under the IMCP curve serve as excellent indicators of classification quality. (© 2024. The Author(s).) |
Databáze: | MEDLINE |
Externí odkaz: |