Dynamic Automaton-Guided Reward Shaping for Monte Carlo Tree Search

Autor:	Alvaro Velasquez, Brett Bissey, Lior Barak, Andre Beckus, Ismail Alkhouri, Daniel Melcer, George Atia
Rok vydání:	2021
Předmět:	General Medicine
Zdroj:	Proceedings of the AAAI Conference on Artificial Intelligence. 35:12015-12023
ISSN:	2374-3468 2159-5399
DOI:	10.1609/aaai.v35i13.17427
Popis:	Reinforcement learning and planning have been revolutionized in recent years, due in part to the mass adoption of deep convolutional neural networks and the resurgence of powerful methods to refine decision-making policies. However, the problem of sparse reward signals and their representation remains pervasive in many domains. While various rewardshaping mechanisms and imitation learning approaches have been proposed to mitigate this problem, the use of humanaided artificial rewards introduces human error, sub-optimal behavior, and a greater propensity for reward hacking. In this paper, we mitigate this by representing objectives as automata in order to define novel reward shaping functions over this structured representation. In doing so, we address the sparse rewards problem within a novel implementation of Monte Carlo Tree Search (MCTS) by proposing a reward shaping function which is updated dynamically to capture statistics on the utility of each automaton transition as it pertains to satisfying the goal of the agent. We further demonstrate that such automaton-guided reward shaping can be utilized to facilitate transfer learning between different environments when the objective is the same.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::3218288d307917c7030bdc819b29a694 https://doi.org/10.1609/aaai.v35i13.17427 Zobrazit plný text záznamu