MuDi-Stream: A multi density clustering algorithm for evolving data stream
Autor: | Teh Ying Wah, Tutut Herawan, Hadi Saboohi, Amineh Amini |
---|---|
Rok vydání: | 2016 |
Předmět: |
DBSCAN
Data stream Clustering high-dimensional data Fuzzy clustering Computer Networks and Communications Computer science Correlation clustering 02 engineering and technology computer.software_genre Biclustering CURE data clustering algorithm 020204 information systems Consensus clustering 0202 electrical engineering electronic engineering information engineering Cluster analysis k-medians clustering Data stream mining Computer Science Applications Determining the number of clusters in a data set Data stream clustering Hardware and Architecture Outlier Canopy clustering algorithm FLAME clustering Affinity propagation 020201 artificial intelligence & image processing Data mining computer |
Zdroj: | Journal of Network and Computer Applications. 59:370-385 |
ISSN: | 1084-8045 |
DOI: | 10.1016/j.jnca.2014.11.007 |
Popis: | Density-based method has emerged as a worthwhile class for clustering data streams. Recently, a number of density-based algorithms have been developed for clustering data streams. However, existing density-based data stream clustering algorithms are not without problem. There is a dramatic decrease in the quality of clustering when there is a range in density of data. In this paper, a new method, called the MuDi-Stream, is developed. It is an online-offline algorithm with four main components. In the online phase, it keeps summary information about evolving multi-density data stream in the form of core mini-clusters. The offline phase generates the final clusters using an adapted density-based clustering algorithm. The grid-based method is used as an outlier buffer to handle both noises and multi-density data and yet is used to reduce the merging time of clustering. The algorithm is evaluated on various synthetic and real-world datasets using different quality metrics and further, scalability results are compared. The experimental results show that the proposed method in this study improves clustering quality in multi-density environments. |
Databáze: | OpenAIRE |
Externí odkaz: |