Autor: |
Angélica Guzmán-Ponce, Rosa María Valdovinos, José Salvador Sánchez, José Raymundo Marcial-Romero |
Jazyk: |
angličtina |
Rok vydání: |
2020 |
Předmět: |
|
Zdroj: |
Applied Sciences, Vol 10, Iss 15, p 5164 (2020) |
Druh dokumentu: |
article |
ISSN: |
2076-3417 |
DOI: |
10.3390/app10155164 |
Popis: |
Class overlap and class imbalance are two data complexities that challenge the design of effective classifiers in Pattern Recognition and Data Mining as they may cause a significant loss in performance. Several solutions have been proposed to face both data difficulties, but most of these approaches tackle each problem separately. In this paper, we propose a two-stage under-sampling technique that combines the DBSCAN clustering algorithm to remove noisy samples and clean the decision boundary with a minimum spanning tree algorithm to face the class imbalance, thus handling class overlap and imbalance simultaneously with the aim of improving the performance of classifiers. An extensive experimental study shows a significantly better behavior of the new algorithm as compared to 12 state-of-the-art under-sampling methods using three standard classification models (nearest neighbor rule, J48 decision tree, and support vector machine with a linear kernel) on both real-life and synthetic databases. |
Databáze: |
Directory of Open Access Journals |
Externí odkaz: |
|