Autor: |
Agrawal, Nishita, Pendharkar, Isha, Shroff, Jugal, Raghuvanshi, Jatin, Neogi, Akashdip, Patil, Shruti, Walambe, Rahee, Kotecha, Ketan |
Zdroj: |
AI and Ethics; November 2024, Vol. 4 Issue: 4 p1143-1174, 32p |
Abstrakt: |
With the recent advancements in the usage of Artificial Intelligence (AI)-based systems in the healthcare and medical domain, it has become necessary to monitor whether these systems make predictions using the correct features or not. For this purpose, many different types of model interpretability and explainability methods are proposed in the literature. However, with the rising number of adversarial attacks against these AI-based systems, it also becomes necessary to make those systems more robust to adversarial attacks and validate the correctness of the generated model explainability. In this work, we first demonstrate how an adversarial attack can affect the model explainability even after robust training. Along with this, we present two different types of attack classifiers: one that can detect whether the given input is benign or adversarial and the other classifier that can identify the type of attack. We also identify the regions affected by the adversarial attack using model explainability. Finally, we demonstrate how the correctness of the generated explainability can be verified using model interpretability methods. |
Databáze: |
Supplemental Index |
Externí odkaz: |
|