A Criterion for Deciding the Number of Clusters in a Dataset Based on Data Depth
Autor: | Ishwar Baidari, Channamma Patil |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: | |
Zdroj: | Vietnam Journal of Computer Science, Vol 7, Iss 4, Pp 417-431 (2020) |
Druh dokumentu: | article |
ISSN: | 2196-8888 2196-8896 21968888 |
DOI: | 10.1142/S2196888820500232 |
Popis: | Clustering is a key method in unsupervised learning with various applications in data mining, pattern recognition and intelligent information processing. However, the number of groups to be formed, usually notated as k is a vital parameter for most of the existing clustering algorithms as their clustering results depend heavily on this parameter. The problem of finding the optimal k value is very challenging. This paper proposes a novel idea for finding the correct number of groups in a dataset based on data depth. The idea is to avoid the traditional process of running the clustering algorithm over a dataset for n times and further, finding the k value for a dataset without setting any specific search range for k parameter. We experiment with different indices, namely CH, KL, Silhouette, Gap, CSP and the proposed method on different real and synthetic datasets to estimate the correct number of groups in a dataset. The experimental results on real and synthetic datasets indicate good performance of the proposed method. |
Databáze: | Directory of Open Access Journals |
Externí odkaz: |