Zobrazeno 1 - 5
of 5
pro vyhledávání: '"Hu, Dailin"'
Autor:
Liang, Litian, Xu, Yaosheng, McAleer, Stephen, Hu, Dailin, Ihler, Alexander, Abbeel, Pieter, Fox, Roy
Publikováno v:
ICML 2022
In temporal-difference reinforcement learning algorithms, variance in value estimation can cause instability and overestimation of the maximal target value. Many algorithms have been proposed to reduce overestimation, including several recent ensembl
Externí odkaz:
http://arxiv.org/abs/2209.07670
Publikováno v:
neurips 2021 deep rl workshop
Soft Actor-Critic (SAC) is considered the state-of-the-art algorithm in continuous action space settings. It uses the maximum entropy framework for efficiency and stability, and applies a heuristic temperature Lagrange term to tune the temperature $\
Externí odkaz:
http://arxiv.org/abs/2112.02852
Maximum Entropy Reinforcement Learning (MaxEnt RL) algorithms such as Soft Q-Learning (SQL) and Soft Actor-Critic trade off reward and policy entropy, which has the potential to improve training stability and robustness. Most MaxEnt RL methods, howev
Externí odkaz:
http://arxiv.org/abs/2111.14204
Autor:
Liang, Litian, Xu, Yaosheng, McAleer, Stephen, Hu, Dailin, Ihler, Alexander, Abbeel, Pieter, Fox, Roy
Temporal-Difference (TD) learning methods, such as Q-Learning, have proven effective at learning a policy to perform control tasks. One issue with methods like Q-Learning is that the value update introduces bias when predicting the TD target of a unf
Externí odkaz:
http://arxiv.org/abs/2110.14818
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.