Reinforcement Learning in Sparse-Reward Environments With Hindsight Policy Gradients

Autor:	Filipe Mutz, Paulo E. Rauber, Jürgen Schmidhuber, Avinash Ummadisingu
Rok vydání:	2021
Předmět:	0209 industrial biotechnology Class (computer programming) Exploit Computer science business.industry Cognitive Neuroscience Sample (statistics) 02 engineering and technology Machine learning computer.software_genre 020901 industrial engineering & automation Arts and Humanities (miscellaneous) 0202 electrical engineering electronic engineering information engineering Selection (linguistics) Reinforcement learning 020201 artificial intelligence & image processing Artificial intelligence business computer Hindsight bias
Zdroj:	Neural Computation. 33:1498-1553
ISSN:	1530-888X 0899-7667
Popis:	A reinforcement learning agent that needs to pursue different goals across episodes requires a goal-conditional policy. In addition to their potential to generalize desirable behavior to unseen goals, such policies may also enable higher-level planning based on subgoals. In sparse-reward environments, the capacity to exploit information about the degree to which an arbitrary goal has been achieved while another goal was intended appears crucial to enabling sample efficient learning. However, reinforcement learning agents have only recently been endowed with such capacity for hindsight. In this letter, we demonstrate how hindsight can be introduced to policy gradient methods, generalizing this idea to a broad class of successful algorithms. Our experiments on a diverse selection of sparse-reward environments show that hindsight leads to a remarkable increase in sample efficiency.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::ffde3a641ec00e6ca2c4913c7ff73619 https://doi.org/10.1162/neco_a_01387 Zobrazit plný text záznamu Plný text ve formátu PDF