XplainableClusterExplorer

Autor: Andreas Theissler, Dominik Raab, Eric Fezer
Rok vydání: 2020
Předmět:
Zdroj: VINCI
DOI: 10.1145/3430036.3430066
Popis: Human-centered machine learning is becoming an emerging field aiming to enable domain experts that do not necessarily have a data science background to make use of machine learning applications. Especially in unsupervised machine learning, e.g. cluster analysis, models cannot be autonomously tuned towards an optimal solution for a given application due to the absence of ground truth like class labels. In cluster analysis, different feature subsets may lead to different clusterings. The identification of the best subset of given features is therefore essential in order to improve the overall clustering performance and to obtain a clustering that is suitable for a given application. To support users in finding an optimal clustering solution, we propose XplainableClusterExplorer, an interactive and explorative approach suitable for feature selection for clustering. In an interactive combination of user and machine learning models, the user is supported by evaluation criteria and visualizations in determining feature subsets and adjusting hyperparameters. For feature subset selection we propose a combination with feature importances from random forests and LIME. Since this requires a supervised setting, the cluster assignments are used as tentative class labels in subsequent step. Our experimental results have shown that this subsequent classification step leveraging calculated feature importances can facilitate feature subset selection and therefore enhance overall clustering performance.
Databáze: OpenAIRE