Actor-critic with familiarity-based trajectory experience replay

Autor:	Hengwei Lu, Xiaoyu Gong, Jiayu Yu, Shuai Lü
Rok vydání:	2022
Předmět:	Information Systems and Management business.industry Computer science Sampling (statistics) Sample (statistics) Machine learning computer.software_genre Computer Science Applications Theoretical Computer Science Artificial Intelligence Control and Systems Engineering Asynchronous communication Data efficiency Feature (machine learning) Trajectory Artificial intelligence business Inefficiency computer Software
Zdroj:	Information Sciences. 582:633-647
ISSN:	0020-0255
DOI:	10.1016/j.ins.2021.10.031
Popis:	This paper aims to solve sample inefficiency in Asynchronous Advantage Actor-Critic (A3C). First, we design a new off-policy actor-critic algorithm, which combines actor-critic with experience replay to improve sample efficiency. Next, we study the sampling method of experience replay for trajectory experience and propose a familiarity-based replay mechanism which uses the number of replay times of experience as the probability weight of sampling. Finally, we use the GAE-V method to correct the bias caused by off-policy learning. We also achieve better performance by adopting a mechanism that combines off-policy learning and on-policy learning to update the network. Our results on Atari and MuJoCo benchmarks show that each of these innovations contributes to improvements in both data efficiency and final performance. Furthermore, our approach keeps a fast coverage speed and the same parallel feature as A3C, and also has better performance on exploration.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::db2924a4138e92cc5ddf793a29195e09 https://doi.org/10.1016/j.ins.2021.10.031 Zobrazit plný text záznamu Full Text from ScienceDirect