An efficient automated incremental density-based algorithm for clustering and classification

Autor: Mehdi Hosseinzadeh, Elham Azhir, Nima Jafari Navimipour, Arash Sharifi, Aso Mohammad Darwesh
Rok vydání: 2021
Předmět:
Zdroj: Future Generation Computer Systems. 114:665-678
ISSN: 0167-739X
DOI: 10.1016/j.future.2020.08.031
Popis: Data clustering divides the datasets into different groups. Incremental Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is a famous density-based clustering technique able to find the clusters of variable sizes and shapes. The quality of incremental DBSCAN results has been influenced by two input parameters: MinPts (Minimum Points) and Eps (Epsilon). Therefore, the parameter setting is one of the major problems of incremental DBSCAN. In the present article, an improved incremental DBSCAN accorded to Non-dominated Sorting Genetic Algorithm II (NSGA-II) has been presented to address the issue. The proposed algorithm adjusts the two parameters (MinPts and Eps) of the incremental DBSCAN via the iteration and the fitness functions to enhance the clustering precision. Moreover, our proposed method introduces suitable fitness functions for both labeled and unlabeled datasets. We have also improved the efficiency of the proposed hybrid algorithm by parallelization of the optimization process. The evaluation of the introduced method has been done through some textual and numerical datasets with different shapes, sizes, and dimensions. According to the experimental results, the proposed algorithm provides better results than Multi-Objective Particle Swarm Optimization (MOPSO) based incremental DBSCAN and a few well-known techniques, particularly regarding the shape and balanced datasets. Also, good speed-up can be reached with a parallel model compared with the serial version of the algorithm.
Databáze: OpenAIRE