Popis: |
Model inversion (MI) attacks aim to infer and reconstruct the input data from the output of a neural network, which poses a severe threat to the privacy of input data. Inspired by adversarial examples, we propose defending against MI attacks by adding adversarial noise to the output. The critical challenge is finding a noise vector that maximizes the inversion error and introduces negligible utility loss to the target model. We propose an algorithm to craft such noise vectors, which also incorporates utility-loss constraints. Specifically, our algorithm takes advantage of the gradient of an inversion model we train to mimic the adversary and compute a noise vector to turn the output into an adversarial example that can maximize the reconstruction error of the inversion model. Then we apply a label modifier that keeps the label unchanged to achieve zero accuracy loss of the target model. Our defense does not tamper with the training process or need the private training dataset. Thus it can be easily applied to any current neural networks or APIs. We evaluate our method under both standard and adaptive attack settings. Our empirical results show our approach is effective against state-of-the-art MI attacks due to the transferability of adversarial examples and outperforms existing defenses. Furthermore, it causes more reconstruction errors while introducing zero accuracy loss and less distortion than existing defenses. |