Blackbox Attacks on Reinforcement Learning Agents Using Approximated Temporal Information

Autor: Robert Mullins, Han Cui, Xitong Gao, Ross Anderson, Yiren Zhao, Ilia Shumailov
Rok vydání: 2020
Předmět:
FOS: Computer and information sciences
Computer Science - Machine Learning
Computer Science - Cryptography and Security
Computer science
Computer Vision and Pattern Recognition (cs.CV)
Computer Science - Computer Vision and Pattern Recognition
Machine Learning (stat.ML)
02 engineering and technology
010501 environmental sciences
Adversarial machine learning
Machine learning
computer.software_genre
01 natural sciences
Machine Learning (cs.LG)
symbols.namesake
Statistics - Machine Learning
0202 electrical engineering
electronic engineering
information engineering

Reinforcement learning
Temporal information
0105 earth and related environmental sciences
Sequence
business.industry
Adversarial Machine Learning
Training methods
Reinforcement Learning
Range (mathematics)
Action (philosophy)
Gaussian noise
symbols
020201 artificial intelligence & image processing
Artificial intelligence
business
Cryptography and Security (cs.CR)
computer
Zdroj: DSN Workshops
DOI: 10.17863/cam.60206
Popis: Recent research on reinforcement learning (RL) has suggested that trained agents are vulnerable to maliciously crafted adversarial samples. In this work, we show how such samples can be generalised from White-box and Grey-box attacks to a strong Black-box case, where the attacker has no knowledge of the agents, their training parameters and their training methods. We use sequence-to-sequence models to predict a single action or a sequence of future actions that a trained agent will make. First, we show our approximation model, based on time-series information from the agent, consistently predicts RL agents' future actions with high accuracy in a Black-box setup on a wide range of games and RL algorithms. Second, we find that although adversarial samples are transferable from the target model to our RL agents, they often outperform random Gaussian noise only marginally. This highlights a serious methodological deficiency in previous work on such agents; random jamming should have been taken as the baseline for evaluation. Third, we propose a novel use for adversarial samplesin Black-box attacks of RL agents: they can be used to trigger a trained agent to misbehave after a specific time delay. This appears to be a genuinely new type of attack. It potentially enables an attacker to use devices controlled by RL agents as time bombs.
Databáze: OpenAIRE