Black-box Bayesian adversarial attack with transferable priors.

Autor: Zhang, Shudong, Gao, Haichang, Shu, Chao, Cao, Xiwen, Zhou, Yunyi, He, Jianping
Předmět:
Zdroj: Machine Learning; Apr2024, Vol. 113 Issue 4, p1511-1528, 18p
Abstrakt: Deep neural networks are vulnerable to adversarial attacks, even in the black-box setting, where the attacker only has query access to the model. The most popular black-box adversarial attacks usually rely on substitute models or gradient estimation to generate imperceptible adversarial examples, which either suffer from low attack success rates or low query efficiency. In real-world scenarios, it is extremely improbable for an attacker to have unlimited bandwidth to query a target classifier. In this paper, we proposed a query efficient gradient-free score-based attack, named BO-ATP, which combines Bayesian optimization strategy with transfer-based attacks and searches for perturbation in low-dimensional latent space. Different from the gradient-based method, in the search process, our attack makes full use of the prior information obtained from the previous query to sample the next optimal point instead of local gradient approximation. Results on MNIST, CIFAR10, and ImageNet show that even at a low 1000 query budget, we still achieve high attack success rates in both targeted and untargeted attacks, and the query efficiency is dozens of times higher than the previous state-of-the-art attack methods. Furthermore, we show that BO-ATP can successfully attack some state-of-the-art defenses, such as adversarial training. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index