Actor-critic with familiarity-based trajectory experience replay
Autor: | Hengwei Lu, Xiaoyu Gong, Jiayu Yu, Shuai Lü |
---|---|
Rok vydání: | 2022 |
Předmět: |
Information Systems and Management
business.industry Computer science Sampling (statistics) Sample (statistics) Machine learning computer.software_genre Computer Science Applications Theoretical Computer Science Artificial Intelligence Control and Systems Engineering Asynchronous communication Data efficiency Feature (machine learning) Trajectory Artificial intelligence business Inefficiency computer Software |
Zdroj: | Information Sciences. 582:633-647 |
ISSN: | 0020-0255 |
DOI: | 10.1016/j.ins.2021.10.031 |
Popis: | This paper aims to solve sample inefficiency in Asynchronous Advantage Actor-Critic (A3C). First, we design a new off-policy actor-critic algorithm, which combines actor-critic with experience replay to improve sample efficiency. Next, we study the sampling method of experience replay for trajectory experience and propose a familiarity-based replay mechanism which uses the number of replay times of experience as the probability weight of sampling. Finally, we use the GAE-V method to correct the bias caused by off-policy learning. We also achieve better performance by adopting a mechanism that combines off-policy learning and on-policy learning to update the network. Our results on Atari and MuJoCo benchmarks show that each of these innovations contributes to improvements in both data efficiency and final performance. Furthermore, our approach keeps a fast coverage speed and the same parallel feature as A3C, and also has better performance on exploration. |
Databáze: | OpenAIRE |
Externí odkaz: |