Interpretability in healthcare: A comparative study of local machine learning interpretability techniques
Autor: | Youssef Sherif, Sherif Sakr, Mouaz H. Al-Mallah, Radwa Elshawi |
---|---|
Rok vydání: | 2020 |
Předmět: |
Artificial neural network
Computer science Process (engineering) business.industry Big data Stability (learning theory) Decision tree Regression analysis 02 engineering and technology Machine learning computer.software_genre Data modeling Random forest Computational Mathematics Artificial Intelligence General Data Protection Regulation 020204 information systems Linear regression Metric (mathematics) 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Artificial intelligence business computer Interpretability |
Zdroj: | CBMS |
ISSN: | 1467-8640 0824-7935 |
Popis: | Although complex machine learning models (e.g., Random Forest, Neural Networks) are commonly outperforming the traditional simple interpretable models (e.g., Linear Regression, Decision Tree), in the healthcare domain, clinicians find it hard to understand and trust these complex models due to the lack of intuition and explanation of their predictions. With the new General Data Protection Regulation (GDPR), the importance for plausibility and verifiability of the predictions made by machine learning models has become essential. To tackle this challenge, recently, several machine learning interpretability techniques have been developed and introduced. In general, the main aim of these interpretability techniques is to shed light and provide insights into the predictions process of the machine learning models and explain how the model predictions have resulted. However, in practice, assessing the quality of the explanations provided by the various interpretability techniques is still questionable. In this paper, we present a comprehensive experimental evaluation of three recent and popular local model agnostic interpretability techniques, namely, LIME, SHAP and Anchors on different types of real-world healthcare data. Our experimental evaluation covers different aspects for its comparison including identity, stability, separability, similarity, execution time and bias detection. The results of our experiments show that LIME achieves the lowest performance for the identity metric and the highest performance for the separability metric across all datasets included in this study. On average, SHAP has the smallest average time to output explanation across all datasets included in this study. For detecting the bias, SHAP enables the participants to better detect the bias. |
Databáze: | OpenAIRE |
Externí odkaz: |