Clustering-based improved adaptive synthetic minority oversampling technique for imbalanced data classification

Autor: Dian Jin, Dehong Xie, Di Liu, Murong Gong
Rok vydání: 2023
Předmět:
Zdroj: Intelligent Data Analysis. 27:635-652
ISSN: 1571-4128
1088-467X
DOI: 10.3233/ida-226612
Popis: Synthetic Minority Oversampling Technique (SMOTE) and some extensions based on it are popularly used to balance imbalanced data. In this study, we concentrate on solving overfitting of the classification model caused by choosing instances to oversample that increase the occurrence of overlaps with the majority class. Our method called Clustering-based Improved Adaptive Synthetic Minority Oversampling Technique (CI-ASMOTE1) decomposes minority instances into sub-clusters according to their connectivity in the feature space and then selects minority sub-clusters which are relatively close to the decision boundary as the candidate regions to oversample. After application of CI-ASMOTE1, new minority instances are only synthesized within each connected region of the selected sub-clusters. Considering the diversity of the synthetic instances in each selected sub-cluster, CI-ASMOTE2 is put forward to extend CI-ASMOTE1 by keeping all features of those instances in the feature space as different as possible. The experimental evaluation shows that CI-ASMOTE1 and CI-ASMOTE2 improve SMOTE and its extensions, especially in the occurrence of overlaps between the minority instances and the majority instances.
Databáze: OpenAIRE
Nepřihlášeným uživatelům se plný text nezobrazuje