ADDPC-SMOTE: An Oversampling Algorithm Based on Density Difference Peak Clustering and Spatial Distribution Entropy

Autor: Wei Wang, Fen Liu
Jazyk: angličtina
Rok vydání: 2023
Předmět:
Zdroj: IEEE Access, Vol 11, Pp 108152-108166 (2023)
Druh dokumentu: article
ISSN: 2169-3536
DOI: 10.1109/ACCESS.2023.3320265
Popis: Most of the existing oversampling algorithms based on clustering do not consider the spatial distribution of Majority class, and it is easy to overlap classes and ignore important information points when synthesizing new samples. To solve this problem, this paper analyzes the influence of the spatial distribution on the oversampling process, and proposes an oversampling algorithm based on Adaptive Density Difference Peak Clustering and Spatial Distribution Entropy. Firstly, the spatial distribution situation of two classes samples is introduced into the clustering process, and the local density difference is used to cluster of Minority class by the peak value, so as to achieve scientific and reasonable selection of sub-cluster centers and reduce the occurrence of class overlap. At the same time, the method of determining the truncation distance according to the previous experience is change. The spatial distribution situation of two classes samples is characterized by constructing Spatial Distribution Entropy. On this basis, the automatic selection and optimization of truncation distance are realized. Then the boundary points and sparse points are screened according to the absolute value of local density difference, and the sampling probabilities of each minority class sample are determined to focus on these important information points. Finally, Spatial Distribution Entropy is used to evaluate the synthetic samples set to ensure that they can balance the distribution of the two classes samples in the dataset. To test the effectiveness of the algorithm, five oversampling algorithms are used to perform comparative experiments on four classifiers and 16 common datasets. The results show that compared with SMOTE, K-means-SMOTE, BS-SMOTE, ADASYN, DPC-SMOTE, the algorithm has significantly improved in all evaluation indexes.
Databáze: Directory of Open Access Journals