Popis: |
With the advancement of miniature sensors, wireless networking and context awareness, the importance of data-intensive computing is on the rise, with practical applications such as web categorization and data mining. One of the critical challenges in data-intensive computing is data clustering, as effective clustering algorithm will enable researchers and automated systems to analyze and organize massive amount of data much more efficiently. Many data clustering algorithms already exist, but most require a priori knowledge on the number of classes to guide the clustering process. We propose~\emph{$Auto\_Ant\_TM^{s}\_Shape$}, a two-phase algorithm, for automatically forming optimal number of clusters with arbitrary shapes. The first phase uses the hybrid approach of K-means and enhanced Ant-based template mechanism to generate small seed clusters with high purity in each cluster. In the second phase, small clusters are iteratively merged to obtain the final clusters using a merging algorithm. We apply \emph{$Auto\_Ant\_TM^{s}\_Shape$} to 8 widely-used datasets, and compare the clustering results with two approaches based on density-based algorithm (DBSCAN) and Particle Swarm Optimization (PSO). The results show that \emph{$Auto\_Ant\_TM^{s}\_Shape$} is very effective and thus achieve good clustering results in near optimal number of clusters without knowing the number of classes in advance. |