MD-SPKM: A set pair k-modes clustering algorithm for incomplete categorical matrix data

Autor: Jiahao Wang, Xiaoze Feng, Song Chen, Ruiyan Gao, Chunying Zhang, Jing Ren, Fengchun Liu
Rok vydání: 2021
Předmět:
Zdroj: Intelligent Data Analysis. 25:1507-1524
ISSN: 1571-4128
1088-467X
DOI: 10.3233/ida-205340
Popis: In order to solve the clustering problem with incomplete and categorical matrix data sets, and considering the uncertain relationship between samples and clusters, a set pair k-modes clustering algorithm is proposed (MD-SPKM). Firstly, the correlation theory of set pair information granule is introduced into k-modes clustering. By improving the distance formula of traditional k-modes algorithm, a set pair distance measurement method between incomplete matrix samples is defined. Secondly, considering the uncertain relationship between the sample and the cluster, the definition of the intra-cluster average distance and the threshold calculation formula to determine whether the sample belongs to multiple clusters is given, and then the result of set pair clustering is formed, which includes positive region, boundary region and negative region. Finally, through the selected three data sets and four contrast algorithms for experimental evaluation, the experimental results show that the set pair k-modes clustering algorithm can effectively handle incomplete categorical matrix data sets, and has good clustering performance in Accuracy, Recall, ARI and NMI.
Databáze: OpenAIRE
Nepřihlášeným uživatelům se plný text nezobrazuje