A New Under-Sampling Method to Face Class Overlap and Imbalance

Autor: Angélica Guzmán-Ponce, Rosa María Valdovinos, José Salvador Sánchez, José Raymundo Marcial-Romero
Jazyk: angličtina
Rok vydání: 2020
Předmět:
Zdroj: Applied Sciences, Vol 10, Iss 15, p 5164 (2020)
Druh dokumentu: article
ISSN: 2076-3417
DOI: 10.3390/app10155164
Popis: Class overlap and class imbalance are two data complexities that challenge the design of effective classifiers in Pattern Recognition and Data Mining as they may cause a significant loss in performance. Several solutions have been proposed to face both data difficulties, but most of these approaches tackle each problem separately. In this paper, we propose a two-stage under-sampling technique that combines the DBSCAN clustering algorithm to remove noisy samples and clean the decision boundary with a minimum spanning tree algorithm to face the class imbalance, thus handling class overlap and imbalance simultaneously with the aim of improving the performance of classifiers. An extensive experimental study shows a significantly better behavior of the new algorithm as compared to 12 state-of-the-art under-sampling methods using three standard classification models (nearest neighbor rule, J48 decision tree, and support vector machine with a linear kernel) on both real-life and synthetic databases.
Databáze: Directory of Open Access Journals