Neural Computing and Applications volume / Constructing adversarial examples to investigate the plausibility of explanations in deep audio and image classifiers

Autor:	Hoedt, Katharina, Praher, Verena, Flexer, Arthur, Widmer, Gerhard
Jazyk:	angličtina
Rok vydání:	2023
Předmět:	Adversarial examples Interpretability Explainability Evaluation
DOI:	10.1007/s00521-022-07918-7
Popis:	Given the rise of deep learning and its inherent black-box nature, the desire to interpret these systems and explain their behaviour became increasingly more prominent. The main idea of so-called explainers is to identify which features of particular samples have the most influence on a classifier’s prediction, and present them as explanations. Evaluating explainers, however, is difficult, due to reasons such as a lack of ground truth. In this work, we construct adversarial examples to check the plausibility of explanations, perturbing input deliberately to change a classifier’s prediction. This allows us to investigate whether explainers are able to detect these perturbed regions as the parts of an input that strongly influence a particular classification. Our results from the audio and image domain suggest that the investigated explainers often fail to identify the input regions most relevant for a prediction; hence, it remains questionable whether explanations are useful or potentially misleading. Fonds zur Förderung der Wissenschaftlichen Forschung P31988 Version of record
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=od______3361::259f33e9cdd2e90c91e4212498eba3e2 https://epub.jku.at/doi/10.1007/s00521-022-07918-7 Zobrazit plný text záznamu