CIM-PPO:Proximal Policy Optimization with Liu-Correntropy Induced Metric
Autor: | Guo, Yunxiao, Long, Han, Duan, Xiaojun, Feng, Kaiyuan, Li, Maochu, Ma, Xiaying |
---|---|
Rok vydání: | 2021 |
Předmět: | |
Druh dokumentu: | Working Paper |
Popis: | As a popular Deep Reinforcement Learning (DRL) algorithm, Proximal Policy Optimization (PPO) has demonstrated remarkable efficacy in numerous complex tasks. According to the penalty mechanism in a surrogate, PPO can be classified into PPO with KL divergence (PPO-KL) and PPO with Clip (PPO-Clip). In this paper, we analyze the impact of asymmetry in KL divergence on PPO-KL and highlight that when this asymmetry is pronounced, it will misguide the improvement of the surrogate. To address this issue, we represent the PPO-KL in inner product form and demonstrate that the KL divergence is a Correntropy Induced Metric (CIM) in Euclidean space. Subsequently, we extend the PPO-KL to the Reproducing Kernel Hilbert Space (RKHS), redefine the inner products with RKHS, and propose the PPO-CIM algorithm. Moreover, this paper states that the PPO-CIM algorithm has a lower computation cost in policy gradient and proves that PPO-CIM can guarantee the new policy is within the trust region while the kernel satisfies some conditions. Finally, we design experiments based on six Mujoco continuous-action tasks to validate the proposed algorithm. The experimental results validate that the asymmetry of KL divergence can affect the policy improvement of PPO-KL and show that the PPO-CIM can perform better than both PPO-KL and PPO-Clip in most tasks. Comment: This work has been submitted to the Elsevier for possible publication |
Databáze: | arXiv |
Externí odkaz: |