An efficient k-means clustering algorithm: analysis and implementation
Autor: | Angela Y. Wu, David M. Mount, Tapas Kanungo, Ruth Silverman, Nathan S. Netanyahu, Christine D. Piatko |
---|---|
Rok vydání: | 2002 |
Předmět: |
Clustering high-dimensional data
DBSCAN Linde–Buzo–Gray algorithm Fuzzy clustering Computer science Correlation clustering Artificial Intelligence Ramer–Douglas–Peucker algorithm CURE data clustering algorithm Nearest-neighbor chain algorithm Cluster analysis Difference-map algorithm k-medians clustering FSA-Red Algorithm k-medoids Heuristic business.industry Applied Mathematics k-means clustering Constrained clustering Pattern recognition Image segmentation Lloyd's algorithm Data set Determining the number of clusters in a data set k-d tree Data stream clustering Computational Theory and Mathematics Canopy clustering algorithm Affinity propagation Computer Vision and Pattern Recognition Artificial intelligence business Software |
Zdroj: | IEEE Transactions on Pattern Analysis and Machine Intelligence. 24:881-892 |
ISSN: | 0162-8828 |
Popis: | In k-means clustering, we are given a set of n data points in d-dimensional space R/sup d/ and an integer k and the problem is to determine a set of k points in Rd, called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for k-means clustering is Lloyd's (1982) algorithm. We present a simple and efficient implementation of Lloyd's k-means clustering algorithm, which we call the filtering algorithm. This algorithm is easy to implement, requiring a kd-tree as the only major data structure. We establish the practical efficiency of the filtering algorithm in two ways. First, we present a data-sensitive analysis of the algorithm's running time, which shows that the algorithm runs faster as the separation between clusters increases. Second, we present a number of empirical studies both on synthetically generated data and on real data sets from applications in color quantization, data compression, and image segmentation. |
Databáze: | OpenAIRE |
Externí odkaz: |