Towards expert-inspired automatic criterion to cut a dendrogram for real-industrial applications

Autor: Shikha Suman, Ashutosh Karna, Karina Gibert
Přispěvatelé: Universitat Politècnica de Catalunya. Doctorat en Intel·ligència Artificial, Universitat Politècnica de Catalunya. Departament d'Estadística i Investigació Operativa, Universitat Politècnica de Catalunya. KEMLG - Grup d'Enginyeria del Coneixement i Aprenentatge Automàtic
Jazyk: angličtina
Rok vydání: 2021
Předmět:
Zdroj: UPCommons. Portal del coneixement obert de la UPC
Universitat Politècnica de Catalunya (UPC)
Popis: Hierarchical clustering is one of the most preferred choices to understand the underlying structure of a dataset and defining typologies, with multiple applications in real life. Among the existing clustering algorithms, the hierarchical family is one of the most popular, as it permits to understand the inner structure of the dataset and find the number of clusters as an output, unlike popular methods, like k-means. One can adjust the granularity of final clustering to the goals of the analysis themselves. The number of clusters in a hierarchical method relies on the analysis of the resulting dendrogram itself. Experts have criteria to visually inspect the dendrogram and determine the number of clusters. Finding automatic criteria to imitate experts in this task is still an open problem. But, dependence on the expert to cut the tree represents a limitation in real applications like the fields industry 4.0 and additive manufacturing. This paper analyses several cluster validity indexes in the context of determining the suitable number of clusters in hierarchical clustering. A new Cluster Validity Index (CVI) is proposed such that it properly catches the implicit criteria used by experts when analyzing dendrograms. The proposal has been applied on a range of datasets and validated against experts ground-truth overcoming the results obtained by the State of the Art and also significantly reduces the computational cost .
Databáze: OpenAIRE