Fast and slow curiosity for high-level exploration in reinforcement learning

Autor:	Ryutaro Ichise, Nicolas Bougie
Rok vydání:	2020
Předmět:	business.industry Computer science media_common.quotation_subject Open problem 02 engineering and technology 010501 environmental sciences Machine learning computer.software_genre 01 natural sciences Variety (cybernetics) Artificial Intelligence 0202 electrical engineering electronic engineering information engineering Benchmark (computing) Curiosity Reinforcement learning 020201 artificial intelligence & image processing Artificial intelligence business computer 0105 earth and related environmental sciences media_common
Zdroj:	Applied Intelligence. 51:1086-1107
ISSN:	1573-7497 0924-669X
DOI:	10.1007/s10489-020-01849-3
Popis:	Deep reinforcement learning (DRL) algorithms rely on carefully designed environment rewards that are extrinsic to the agent. However, in many real-world scenarios rewards are sparse or delayed, motivating the need for discovering efficient exploration strategies. While intrinsically motivated agents hold promise of better local exploration, solving problems that require coordinated decisions over long-time horizons remains an open problem. We postulate that to discover such strategies, a DRL agent should be able to combine local and high-level exploration behaviors. To this end, we introduce the concept of fast and slow curiosity that aims to incentivize long-time horizon exploration. Our method decomposes the curiosity bonus into a fast reward that deals with local exploration and a slow reward that encourages global exploration. We formulate this bonus as the error in an agent’s ability to reconstruct the observations given their contexts. We further propose to dynamically weight local and high-level strategies by measuring state diversity. We evaluate our method on a variety of benchmark environments, including Minigrid, Super Mario Bros, and Atari games. Experimental results show that our agent outperforms prior approaches in most tasks in terms of exploration efficiency and mean scores.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::68f3c9f22578fd034affcfb1b799aeaa https://doi.org/10.1007/s10489-020-01849-3 Zobrazit plný text záznamu Full text from SpringerLink