CIM-PPO:Proximal Policy Optimization with Liu-Correntropy Induced Metric

Autor:	Guo, Yunxiao, Long, Han, Duan, Xiaojun, Feng, Kaiyuan, Li, Maochu, Ma, Xiaying
Rok vydání:	2021
Předmět:	Computer Science - Machine Learning Computer Science - Artificial Intelligence
Druh dokumentu:	Working Paper
Popis:	As a popular Deep Reinforcement Learning (DRL) algorithm, Proximal Policy Optimization (PPO) has demonstrated remarkable efficacy in numerous complex tasks. According to the penalty mechanism in a surrogate, PPO can be classified into PPO with KL divergence (PPO-KL) and PPO with Clip (PPO-Clip). In this paper, we analyze the impact of asymmetry in KL divergence on PPO-KL and highlight that when this asymmetry is pronounced, it will misguide the improvement of the surrogate. To address this issue, we represent the PPO-KL in inner product form and demonstrate that the KL divergence is a Correntropy Induced Metric (CIM) in Euclidean space. Subsequently, we extend the PPO-KL to the Reproducing Kernel Hilbert Space (RKHS), redefine the inner products with RKHS, and propose the PPO-CIM algorithm. Moreover, this paper states that the PPO-CIM algorithm has a lower computation cost in policy gradient and proves that PPO-CIM can guarantee the new policy is within the trust region while the kernel satisfies some conditions. Finally, we design experiments based on six Mujoco continuous-action tasks to validate the proposed algorithm. The experimental results validate that the asymmetry of KL divergence can affect the policy improvement of PPO-KL and show that the PPO-CIM can perform better than both PPO-KL and PPO-Clip in most tasks. Comment: This work has been submitted to the Elsevier for possible publication
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2110.10522 Zobrazit plný text záznamu View this record from Arxiv