A NEW DENSITY BASED SAMPLING TO ENHANCE DBSCAN CLUSTERING ALGORITHM
Autor: | Israa S. Kamil, Safaa O. Al-Mamory |
---|---|
Rok vydání: | 2019 |
Předmět: |
DBSCAN
General Computer Science Computer science Sampling (statistics) 02 engineering and technology 010502 geochemistry & geophysics 01 natural sciences ComputingMethodologies_PATTERNRECOGNITION Outlier 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Noise (video) Cluster analysis Algorithm Time complexity 0105 earth and related environmental sciences Sparse matrix Sampling bias |
Zdroj: | Malaysian Journal of Computer Science. 32:315-327 |
ISSN: | 0127-9084 |
DOI: | 10.22452/mjcs.vol32no4.5 |
Popis: | DBSCAN is one of the efficient density-based clustering algorithms. It is characterized by its ability to discover clusters with different shapes and sizes, and to separate noise and outliers. However, when the dataset contain different densities, DBSCAN clustering will be inefficient. In this paper, we propose an approach to enable DBSCAN to cluster dataset having different densities by preprocess the dataset to make it with one density level. This system composed of four stages: firstly, a new approach to separate dataset based on density is presented. Secondly, a new density biased sampling technique is proposed. Thirdly, the resulted sparse data from the last two stages is clustered with DBSCAN. Finally, the remaining data from sampling will be clustered with KNN. The experimental results on synthetic and real datasets on average show that the clustering of the proposed algorithm is better than that of DBSCAN by more than 7% and retains time complexity of DBSCAN |
Databáze: | OpenAIRE |
Externí odkaz: |