k-PbC: an improved cluster center initialization for categorical data clustering
Autor: | Duy-Tai Dinh, Van-Nam Huynh |
---|---|
Rok vydání: | 2020 |
Předmět: |
Measure (data warehouse)
Computer science Initialization 02 engineering and technology computer.software_genre Data set Set (abstract data type) ComputingMethodologies_PATTERNRECOGNITION Artificial Intelligence Kernel (statistics) 0202 electrical engineering electronic engineering information engineering Cluster (physics) 020201 artificial intelligence & image processing Data mining Cluster analysis Categorical variable computer |
Zdroj: | Applied Intelligence. 50:2610-2632 |
ISSN: | 1573-7497 0924-669X |
Popis: | The performance of a partitional clustering algorithm is influenced by the initial random choice of cluster centers. Different runs of the clustering algorithm on the same data set often yield different results. This paper addresses that challenge by proposing an algorithm named k-PbC, which takes advantage of non-random initialization from the view of pattern mining to improve clustering quality. Specifically, k-PbC first performs a maximal frequent itemset mining approach to find a set of initial clusters. It then uses a kernel-based method to form cluster centers and an information-theoretic based dissimilarity measure to estimate the distance between cluster centers and data objects. An extensive experimental study was performed on various real categorical data sets to draw a comparison between k-PbC and state-of-the-art categorical clustering algorithms in terms of clustering quality. Comparative results have revealed that the proposed initialization method can enhance clustering results and k-PbC outperforms compared algorithms for both internal and external validation metrics. |
Databáze: | OpenAIRE |
Externí odkaz: |