Fast global k-means clustering using cluster membership and inequality
Autor: | Tsung-Jen Huang, Jim Z. C. Lai |
---|---|
Rok vydání: | 2010 |
Předmět: |
Fuzzy clustering
Single-linkage clustering Correlation clustering Determining the number of clusters in a data set Artificial Intelligence CURE data clustering algorithm Signal Processing Canopy clustering algorithm Computer Vision and Pattern Recognition Cluster analysis Algorithm Software k-medians clustering Mathematics |
Zdroj: | Pattern Recognition. 43:1954-1963 |
ISSN: | 0031-3203 |
DOI: | 10.1016/j.patcog.2009.11.021 |
Popis: | In this paper, we present a fast global k-means clustering algorithm by making use of the cluster membership and geometrical information of a data point. This algorithm is referred to as MFGKM. The algorithm uses a set of inequalities developed in this paper to determine a starting point for the jth cluster center of global k-means clustering. Adopting multiple cluster center selection (MCS) for MFGKM, we also develop another clustering algorithm called MFGKM+MCS. MCS determines more than one starting point for each step of cluster split; while the available fast and modified global k-means clustering algorithms select one starting point for each cluster split. Our proposed method MFGKM can obtain the least distortion; while MFGKM+MCS may give the least computing time. Compared to the modified global k-means clustering algorithm, our method MFGKM can reduce the computing time and number of distance calculations by a factor of 3.78-5.55 and 21.13-31.41, respectively, with the average distortion reduction of 5,487 for the Statlog data set. Compared to the fast global k-means clustering algorithm, our method MFGKM+MCS can reduce the computing time by a factor of 5.78-8.70 with the average reduction of distortion of 30,564 using the same data set. The performances of our proposed methods are more remarkable when a data set with higher dimension is divided into more clusters. |
Databáze: | OpenAIRE |
Externí odkaz: |