XplainableClusterExplorer
Autor: | Andreas Theissler, Dominik Raab, Eric Fezer |
---|---|
Rok vydání: | 2020 |
Předmět: |
Selection (relational algebra)
010308 nuclear & particles physics Computer science business.industry k-means clustering 020207 software engineering Feature selection 02 engineering and technology Machine learning computer.software_genre 01 natural sciences Field (computer science) Random forest Feature (computer vision) 0103 physical sciences 0202 electrical engineering electronic engineering information engineering Unsupervised learning Artificial intelligence business Cluster analysis computer |
Zdroj: | VINCI |
DOI: | 10.1145/3430036.3430066 |
Popis: | Human-centered machine learning is becoming an emerging field aiming to enable domain experts that do not necessarily have a data science background to make use of machine learning applications. Especially in unsupervised machine learning, e.g. cluster analysis, models cannot be autonomously tuned towards an optimal solution for a given application due to the absence of ground truth like class labels. In cluster analysis, different feature subsets may lead to different clusterings. The identification of the best subset of given features is therefore essential in order to improve the overall clustering performance and to obtain a clustering that is suitable for a given application. To support users in finding an optimal clustering solution, we propose XplainableClusterExplorer, an interactive and explorative approach suitable for feature selection for clustering. In an interactive combination of user and machine learning models, the user is supported by evaluation criteria and visualizations in determining feature subsets and adjusting hyperparameters. For feature subset selection we propose a combination with feature importances from random forests and LIME. Since this requires a supervised setting, the cluster assignments are used as tentative class labels in subsequent step. Our experimental results have shown that this subsequent classification step leveraging calculated feature importances can facilitate feature subset selection and therefore enhance overall clustering performance. |
Databáze: | OpenAIRE |
Externí odkaz: |