Deep Reinforcement Learning with Phasic Policy Gradient with Sample Reuse

Autor:	LI Hailiang, WANG Li
Jazyk:	English<br />Chinese
Rok vydání:	2024
Předmět:	deep reinforcement learning phasic policy gradient sample reuse Chemical engineering TP155-156 Materials of engineering and construction. Mechanics of materials TA401-492 Technology
Zdroj:	Taiyuan Ligong Daxue xuebao, Vol 55, Iss 4, Pp 712-719 (2024)
Druh dokumentu:	article
ISSN:	1007-9432
DOI:	10.16355/j.tyut.1007-9432.20230300
Popis:	Purposes The algoritihm of phasic policy gradient with sample reuse (SR-PPG) is proposed to address the problems of non-reuse of samples and low sample utilization in policybased deep reinforcement learning algorithms. Methods In the proposed algorithm, offline data are introduced on the basis of the phasic policy gradient (PPG), thus reducing the time cost of training and enabling the model to converge quickly. In this work, SR-PPG combines the stability advantages of theoretically supported on-policy algorithms with the sample efficiency of offpolicy algorithms to develop policy improvement guarantees applicable to off-policy settings and to link these bounds to the clipping mechanism used by PPG. Findings A series of theoretical and experimental demonstrations show that this algorithm provides better performance by effectively balancing the competing goals of stability and sample efficiency.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/27e07ba9f2a842f68b28a0e2371f98b1 Zobrazit plný text záznamu View record in DOAJ