Model Reconstruction from Model Explanations

Autor:	Anca D. Dragan, Ludwig Schmidt, Smitha Milli, Moritz Hardt
Rok vydání:	2019
Předmět:	FOS: Computer and information sciences Orders of magnitude (bit rate) Model reconstruction Computer Science - Machine Learning Theoretical computer science Statistics - Machine Learning Computer science Perspective (graphical) Dimension (graph theory) Machine Learning (stat.ML) Heuristics Machine Learning (cs.LG) Power (physics)
Zdroj:	FAT
Popis:	We show through theory and experiment that gradient-based explanations of a model quickly reveal the model itself. Our results speak to a tension between the desire to keep a proprietary model secret and the ability to offer model explanations. On the theoretical side, we give an algorithm that provably learns a two-layer ReLU network in a setting where the algorithm may query the gradient of the model with respect to chosen inputs. The number of queries is independent of the dimension and nearly optimal in its dependence on the model size. Of interest not only from a learning-theoretic perspective, this result highlights the power of gradients rather than labels as a learning primitive. Complementing our theory, we give effective heuristics for reconstructing models from gradient explanations that are orders of magnitude more query-efficient than reconstruction attacks relying on prediction interfaces.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::278162ca874a045f480018c6ec88c671 https://doi.org/10.1145/3287560.3287562 Zobrazit plný text záznamu