The K-Means Clustering Algorithm With Semantic Similarity To Estimate The Cost of Hospitalization

Autor: Ida Bagus Gede Sarasvananda, Retantyo Wardoyo, Anny Kartika Sari
Jazyk: English<br />Indonesian
Rok vydání: 2019
Předmět:
Zdroj: IJCCS (Indonesian Journal of Computing and Cybernetics Systems), Vol 13, Iss 4, Pp 313-322 (2019)
Druh dokumentu: article
ISSN: 1978-1520
2460-7258
DOI: 10.22146/ijccs.45093
Popis: The cost of hospitalization from a patient can be estimated by performing a cluster of patient. One of the algorithms that is widely used for clustering is K-means. K-means algorithm, based on distance still has weaknesses in terms of measuring the proximity of meaning or semantics between data. To overcome this problem, semantic similarity can be used to measure the similarity between objects in clustering, so that, semantic proximity can be calculated. This study aims to conduct clustering of patient data by paying attention to the similarity of the patient’s disease. ICD code is used as a guide in determining a patient’s disease. The K-means method is combined with semantic similarity to measure the proximity of the patient’s ICD code. The method used to measure the semantic similarity between data, in this study, is the semantic similarity of Girardi, Leacock & Chodorow, Rada, and Jaccard Similarity. Cluster quality measurement uses the silhouette coefficient method. Based on the experimental results, the method of measuring semantic similarity data is capable to produce better quality clustering results than without semantic similarity. The best accuracy is 91.78% for the three semantic similarity methods, whereas without semantic similarity the best accuracy is 84.93%.
Databáze: Directory of Open Access Journals