Zobrazeno 1 - 10
of 6 287
pro vyhledávání: '"Dabney, A"'
Navigating multiple tasks$\unicode{x2014}$for instance in succession as in continual or lifelong learning, or in distributions as in meta or multi-task learning$\unicode{x2014}$requires some notion of adaptation. Evolution over timescales of millenni
Externí odkaz:
http://arxiv.org/abs/2408.08446
Autor:
Lyle, Clare, Zheng, Zeyu, Khetarpal, Khimya, Martens, James, van Hasselt, Hado, Pascanu, Razvan, Dabney, Will
Normalization layers have recently experienced a renaissance in the deep reinforcement learning and continual learning literature, with several works highlighting diverse benefits such as improving loss landscape conditioning and combatting overestim
Externí odkaz:
http://arxiv.org/abs/2407.01800
Autor:
Khetarpal, Khimya, Guo, Zhaohan Daniel, Pires, Bernardo Avila, Tang, Yunhao, Lyle, Clare, Rowland, Mark, Heess, Nicolas, Borsa, Diana, Guez, Arthur, Dabney, Will
Learning a good representation is a crucial challenge for Reinforcement Learning (RL) agents. Self-predictive learning provides means to jointly learn a latent representation and dynamics model by bootstrapping from future latent representations (BYO
Externí odkaz:
http://arxiv.org/abs/2406.02035
Autor:
Tang, Yunhao, Guo, Daniel Zhaohan, Zheng, Zeyu, Calandriello, Daniele, Cao, Yuan, Tarassov, Eugene, Munos, Rémi, Pires, Bernardo Ávila, Valko, Michal, Cheng, Yong, Dabney, Will
Reinforcement learning from human feedback (RLHF) is the canonical framework for large language model alignment. However, rising popularity in offline alignment algorithms challenge the need for on-policy sampling in RLHF. Within the context of rewar
Externí odkaz:
http://arxiv.org/abs/2405.08448
Autor:
Lyle, Clare, Zheng, Zeyu, Khetarpal, Khimya, van Hasselt, Hado, Pascanu, Razvan, Martens, James, Dabney, Will
Underpinning the past decades of work on the design, initialization, and optimization of neural networks is a seemingly innocuous assumption: that the network is trained on a \textit{stationary} data distribution. In settings where this assumption is
Externí odkaz:
http://arxiv.org/abs/2402.18762
Autor:
Wiltzer, Harley, Farebrother, Jesse, Gretton, Arthur, Tang, Yunhao, Barreto, André, Dabney, Will, Bellemare, Marc G., Rowland, Mark
This paper contributes a new approach for distributional reinforcement learning which elucidates a clean separation of transition structure and reward in the learning process. Analogous to how the successor representation (SR) describes the expected
Externí odkaz:
http://arxiv.org/abs/2402.08530
We propose a new algorithm for model-based distributional reinforcement learning (RL), and prove that it is minimax-optimal for approximating return distributions with a generative model (up to logarithmic factors), resolving an open question of Zhan
Externí odkaz:
http://arxiv.org/abs/2402.07598
We introduce off-policy distributional Q($\lambda$), a new addition to the family of off-policy distributional evaluation algorithms. Off-policy distributional Q($\lambda$) does not apply importance sampling for off-policy learning, which introduces
Externí odkaz:
http://arxiv.org/abs/2402.05766
Autor:
Holmes, Laurens, Jr, Enguancho, Elias Malachi, Hinson, Rakinya, Williams, Justin, Nelson, Carlin, Whaley, Kayla Janae, Dabney, Kirk, Williams, Johnette, Dias, Emanuelle Medeiros
Publikováno v:
International Journal of Human Rights in Healthcare, 2022, Vol. 17, Issue 4, pp. 367-377.
Externí odkaz:
http://www.emeraldinsight.com/doi/10.1108/IJHRH-03-2022-0017
Autor:
Dabney, Shane
Publikováno v:
Cityscape, 2024 Jan 01. 26(2), 401-412.
Externí odkaz:
https://www.jstor.org/stable/48785828