Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

Autor:	M. Emre Celebi, Hassan A. Kingravi
Rok vydání:	2014
Předmět:	Computer science k-means clustering Initialization 02 engineering and technology Invariant (physics) Minimax computer.software_genre 01 natural sciences 010104 statistics & probability Data point 0202 electrical engineering electronic engineering information engineering Unsupervised learning 020201 artificial intelligence & image processing Data mining 0101 mathematics Invariant (mathematics) Cluster analysis computer Time complexity
Zdroj:	Partitional Clustering Algorithms ISBN: 9783319092584
DOI:	10.1007/978-3-319-09259-1_3
Popis:	Over the past five decades, k-means has become the clustering algorithm of choice in many application domains primarily due to its simplicity, time/space efficiency, and invariance to the ordering of the data points. Unfortunately, the algorithm’s sensitivity to the initial selection of the cluster centers remains to be its most serious drawback. Numerous initialization methods have been proposed to address this drawback. Many of these methods, however, have time complexity superlinear in the number of data points, which makes them impractical for large data sets. On the other hand, linear methods are often random and/or sensitive to the ordering of the data points. These methods are generally unreliable in that the quality of their results is unpredictable. Therefore, it is common practice to perform multiple runs of such methods and take the output of the run that produces the best results. Such a practice, however, greatly increases the computational requirements of the otherwise highly efficient k-means algorithm. In this chapter, we investigate the empirical performance of six linear, deterministic (non-random), and order-invariant k-means initialization methods on a large and diverse collection of data sets from the UCI Machine Learning Repository. The results demonstrate that two relatively unknown hierarchical initialization methods due to Su and Dy outperform the remaining four methods with respect to two objective effectiveness criteria. In addition, a recent method due to Erisoglu et al. performs surprisingly poorly.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::6311b60ede30621b9d9de265db5a8f66 https://doi.org/10.1007/978-3-319-09259-1_3 Zobrazit plný text záznamu