A method of two-stage clustering learning based on improved DBSCAN and density peak algorithm

Autor: Xuming Han, Mingyang Li, Xinhua Bi, Limin Wang
Rok vydání: 2021
Předmět:
Zdroj: Computer Communications. 167:75-84
ISSN: 0140-3664
DOI: 10.1016/j.comcom.2020.12.019
Popis: Density peak (DP) and density-based spatial clustering of applications with noise (DBSCAN) are the representative clustering algorithms on the basis of density in unsupervised learning. They are capable of clustering data of arbitrary shape as well as identifying noise samples in a potential data set. Notwithstanding, DP algorithm depends on the decision graph when selecting the centers, it is difficult for users without priori knowledge to automatically as well as accurately identify cluster centers. The clustering performance exhibited by DBSCAN algorithm presents a strong sensitivity to parameter setting regarding Eps and MinPts. For dealing with afore-mentioned issues, we propose a new two-stage clustering method based on improved DBSCAN and DP algorithm (TSCM), which first use an improved DBSCAN algorithm based on bat optimization to generate initial clusters. Specifically, the improved DBSCAN takes a well-known internal clustering validation index without labels called Silhouette as fitness function to control the process of parameters determination by bat optimization. The cluster centers in decision graph are automatically selected according to the initial clusters. The final clusters are obtained by DP with the determined cluster centers. As found in the experiments, relative to DP and DBSCAN, TSCM can effectively overcome the manual intervention of cluster center selection in DP and parameters setting in DBSCAN. The clustering performance is significantly improved.
Databáze: OpenAIRE