An Off-Policy Trust Region Policy Optimization Method With Monotonic Improvement Guarantee for Deep Reinforcement Learning

Autor:	Gang Pan, Wenjia Meng, Yue Shi, Qian Zheng
Rok vydání:	2021
Předmět:	Mathematical optimization Trust region Artificial Intelligence Computer Networks and Communications Computer science Control (management) Reinforcement learning Monotonic function Software Computer Science Applications
Zdroj:	IEEE transactions on neural networks and learning systems. 33(5)
ISSN:	2162-2388
Popis:	In deep reinforcement learning, off-policy data help reduce on-policy interaction with the environment, and the trust region policy optimization (TRPO) method is efficient to stabilize the policy optimization procedure. In this article, we propose an off-policy TRPO method, off-policy TRPO, which exploits both on- and off-policy data and guarantees the monotonic improvement of policies. A surrogate objective function is developed to use both on- and off-policy data and keep the monotonic improvement of policies. We then optimize this surrogate objective function by approximately solving a constrained optimization problem under arbitrary parameterization and finite samples. We conduct experiments on representative continuous control tasks from OpenAI Gym and MuJoCo. The results show that the proposed off-policy TRPO achieves better performance in the majority of continuous control tasks compared with other trust region policy-based methods using off-policy data.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::8b830d02cd5e1ff7a86e42c0fffc11a5 https://pubmed.ncbi.nlm.nih.gov/33481718 Zobrazit plný text záznamu