Automatic Data Clustering Analysis of Arbitrary Shape with K-Means and Enhanced Ant-Based Template Mechanism

Autor:	Wei Zhang, Hen-I Yang, Hsin-yi Jiang, Carl K. Chang
Rok vydání:	2012
Předmět:	DBSCAN Categorization Computer science Correlation clustering k-means clustering Cluster (physics) Particle swarm optimization Algorithm design Data mining Cluster analysis computer.software_genre Algorithm computer
Zdroj:	COMPSAC
DOI:	10.1109/compsac.2012.66
Popis:	With the advancement of miniature sensors, wireless networking and context awareness, the importance of data-intensive computing is on the rise, with practical applications such as web categorization and data mining. One of the critical challenges in data-intensive computing is data clustering, as effective clustering algorithm will enable researchers and automated systems to analyze and organize massive amount of data much more efficiently. Many data clustering algorithms already exist, but most require a priori knowledge on the number of classes to guide the clustering process. We propose~\emph{$Auto\_Ant\_TM^{s}\_Shape$}, a two-phase algorithm, for automatically forming optimal number of clusters with arbitrary shapes. The first phase uses the hybrid approach of K-means and enhanced Ant-based template mechanism to generate small seed clusters with high purity in each cluster. In the second phase, small clusters are iteratively merged to obtain the final clusters using a merging algorithm. We apply \emph{$Auto\_Ant\_TM^{s}\_Shape$} to 8 widely-used datasets, and compare the clustering results with two approaches based on density-based algorithm (DBSCAN) and Particle Swarm Optimization (PSO). The results show that \emph{$Auto\_Ant\_TM^{s}\_Shape$} is very effective and thus achieve good clustering results in near optimal number of clusters without knowing the number of classes in advance.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::7d2fefae632c56405dd346393acdb0bf https://doi.org/10.1109/compsac.2012.66 Zobrazit plný text záznamu