Clustering and classification for dry bean feature imbalanced data.
Autor: | Lee CY; School of Big Data, Fuzhou University of International Studies and Trade, Fuzhou, 350202, China. lqy@fzfu.edu.cn., Wang W; School of Software, Yunnan University, Kunming, 650000, China., Huang JQ; School of Big Data, Fuzhou University of International Studies and Trade, Fuzhou, 350202, China. |
---|---|
Jazyk: | angličtina |
Zdroj: | Scientific reports [Sci Rep] 2024 Dec 28; Vol. 14 (1), pp. 31058. Date of Electronic Publication: 2024 Dec 28. |
DOI: | 10.1038/s41598-024-82253-6 |
Abstrakt: | The traditional machine learning methods such as decision tree (DT), random forest (RF), and support vector machine (SVM) have low classification performance. This paper proposes an algorithm for the dry bean dataset and obesity levels dataset that can balance the minority class and the majority class and has a clustering function to improve the traditional machine learning classification accuracy and various performance indicators such as precision, recall, f1-score, and area under curve (AUC) for imbalanced data. The key idea is to use the advantages of borderline-synthetic minority oversampling technique (BLSMOTE) to generate new samples using samples on the boundary of minority class samples to reduce the impact of noise on model building, and the advantages of K-means clustering to divide data into different groups according to similarities or common features. The results show that the proposed algorithm BLSMOTE + K-means + SVM is superior to other traditional machine learning methods in classification and various performance indicators. The BLSMOTE + K-means + DT generates decision rules for the dry bean dataset and the the obesity levels dataset, and the BLSMOTE + K-means + RF ranks the importance of explanatory variables. These experimental results can provide scientific evidence for decision-makers. Competing Interests: Declarations. Competing interests: The authors declare no competing interests. (© 2024. The Author(s).) |
Databáze: | MEDLINE |
Externí odkaz: |