Exploring adversarial examples and adversarial robustness of convolutional neural networks by mutual information.

Autor: Zhang, Jiebao, Qian, Wenhua, Cao, Jinde, Xu, Dan
Předmět:
Zdroj: Neural Computing & Applications; Aug2024, Vol. 36 Issue 23, p14379-14394, 16p
Abstrakt: Convolutional neural networks (CNNs) are susceptible to adversarial examples, which are similar to original examples but contain malicious perturbations. Adversarial training is a simple and effective defense method to improve the robustness of CNNs to adversarial examples. Many works explore the mechanism behind adversarial examples and adversarial training. However, mutual information is rarely present in the interpretation of these counter-intuitive phenomena. This work investigates similarities and differences between normally trained CNNs (NT-CNNs) and adversarially trained CNNs (AT-CNNs) from the mutual information perspective. We show that although mutual information trends of NT-CNNs and AT-CNNs are similar throughout training for original and adversarial examples, there exists an obvious difference. Compared with NT-CNNs, AT-CNNs achieve a lower clean accuracy and extract less information from the input. CNNs trained with different methods have different preferences for certain types of information; NT-CNNs tend to extract texture-based information from the input, while AT-CNNs prefer shape-based information. The reason why adversarial examples mislead CNNs may be that they contain more texture-based information about other classes. Furthermore, we also analyze the mutual information estimators used in this work and find that they outline the geometric properties of the middle layer's output. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index