Výsledky vyhledávání - "Gan, Yaozhong"

Report

The Exploration-Exploitation Dilemma Revisited: An Entropy Perspective

Autor: Yan, Renye, Gan, Yaozhong, Wu, You, Liang, Ling, Xing, Junliang, Cai, Yimao, Huang, Ru

The imbalance of exploration and exploitation has long been a significant challenge in reinforcement learning. In policy optimization, excessive reliance on exploration reduces learning efficiency, while over-dependence on exploitation might trap age

Externí odkaz: http://arxiv.org/abs/2408.09974

Zobrazit plný text záznamu

Report

Transductive Off-policy Proximal Policy Optimization

Autor: Gan, Yaozhong, Yan, Renye, Tan, Xiaoyang, Wu, Zhe, Xing, Junliang

Proximal Policy Optimization (PPO) is a popular model-free reinforcement learning algorithm, esteemed for its simplicity and efficacy. However, due to its inherent on-policy nature, its proficiency in harnessing data from disparate policies is constr

Externí odkaz: http://arxiv.org/abs/2406.03894

Zobrazit plný text záznamu

Report

Reflective Policy Optimization

Autor: Gan, Yaozhong, Yan, Renye, Wu, Zhe, Xing, Junliang

On-policy reinforcement learning methods, like Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO), often demand extensive data per update, leading to sample inefficiency. This paper introduces Reflective Policy Optimizatio

Externí odkaz: http://arxiv.org/abs/2406.03678

Zobrazit plný text záznamu

Report

Robust Action Gap Increasing with Clipped Advantage Learning

Autor: Zhang, Zhe, Gan, Yaozhong, Tan, Xiaoyang

Advantage Learning (AL) seeks to increase the action gap between the optimal action and its competitors, so as to improve the robustness to estimation errors. However, the method becomes problematic when the optimal action induced by the approximated

Externí odkaz: http://arxiv.org/abs/2203.11677

Zobrazit plný text záznamu

Report

Smoothing Advantage Learning

Autor: Gan, Yaozhong, Zhang, Zhe, Tan, Xiaoyang

Advantage learning (AL) aims to improve the robustness of value-based reinforcement learning against estimation errors with action-gap-based regularization. Unfortunately, the method tends to be unstable in the case of function approximation. In this

Externí odkaz: http://arxiv.org/abs/2203.10445

Zobrazit plný text záznamu

Report

Stabilizing Q Learning Via Soft Mellowmax Operator

Autor: Gan, Yaozhong, Zhang, Zhe, Tan, Xiaoyang

Learning complicated value functions in high dimensional state space by function approximation is a challenging task, partially due to that the max-operator used in temporal difference updates can theoretically cause instability for most linear or no

Externí odkaz: http://arxiv.org/abs/2012.09456

Zobrazit plný text záznamu

Report

Trust Region-Guided Proximal Policy Optimization

Autor: Wang, Yuhui, He, Hao, Tan, Xiaoyang, Gan, Yaozhong

Proximal policy optimization (PPO) is one of the most popular deep reinforcement learning (RL) methods, achieving state-of-the-art performance across a wide range of challenging tasks. However, as a model-free RL method, the success of PPO relies hea

Externí odkaz: http://arxiv.org/abs/1901.10314

Zobrazit plný text záznamu

Akademický článek

Alleviating the estimation bias of deep deterministic policy gradient via co-regularization

Autor: Li, Yao, Wang, YuHui, Gan, YaoZhong, Tan, XiaoYang

Publikováno v: In Pattern Recognition November 2022 131

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání