Výsledky vyhledávání

Report

Lifelong Reinforcement Learning via Neuromodulation

Autor: Lee, Sebastian, Garcia, Samuel Liebana, Clopath, Claudia, Dabney, Will

Navigating multiple tasks$\unicode{x2014}$for instance in succession as in continual or lifelong learning, or in distributions as in meta or multi-task learning$\unicode{x2014}$requires some notion of adaptation. Evolution over timescales of millenni

Externí odkaz: http://arxiv.org/abs/2408.08446

Zobrazit plný text záznamu

Report

Normalization and effective learning rates in reinforcement learning

Autor: Lyle, Clare, Zheng, Zeyu, Khetarpal, Khimya, Martens, James, van Hasselt, Hado, Pascanu, Razvan, Dabney, Will

Normalization layers have recently experienced a renaissance in the deep reinforcement learning and continual learning literature, with several works highlighting diverse benefits such as improving loss landscape conditioning and combatting overestim

Externí odkaz: http://arxiv.org/abs/2407.01800

Zobrazit plný text záznamu

Report

A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning

Autor: Khetarpal, Khimya, Guo, Zhaohan Daniel, Pires, Bernardo Avila, Tang, Yunhao, Lyle, Clare, Rowland, Mark, Heess, Nicolas, Borsa, Diana, Guez, Arthur, Dabney, Will

Learning a good representation is a crucial challenge for Reinforcement Learning (RL) agents. Self-predictive learning provides means to jointly learn a latent representation and dynamics model by bootstrapping from future latent representations (BYO

Externí odkaz: http://arxiv.org/abs/2406.02035

Zobrazit plný text záznamu

Report

Understanding the performance gap between online and offline alignment algorithms

Autor: Tang, Yunhao, Guo, Daniel Zhaohan, Zheng, Zeyu, Calandriello, Daniele, Cao, Yuan, Tarassov, Eugene, Munos, Rémi, Pires, Bernardo Ávila, Valko, Michal, Cheng, Yong, Dabney, Will

Reinforcement learning from human feedback (RLHF) is the canonical framework for large language model alignment. However, rising popularity in offline alignment algorithms challenge the need for on-policy sampling in RLHF. Within the context of rewar

Externí odkaz: http://arxiv.org/abs/2405.08448

Zobrazit plný text záznamu

Report

Disentangling the Causes of Plasticity Loss in Neural Networks

Autor: Lyle, Clare, Zheng, Zeyu, Khetarpal, Khimya, van Hasselt, Hado, Pascanu, Razvan, Martens, James, Dabney, Will

Underpinning the past decades of work on the design, initialization, and optimization of neural networks is a seemingly innocuous assumption: that the network is trained on a \textit{stationary} data distribution. In settings where this assumption is

Externí odkaz: http://arxiv.org/abs/2402.18762

Zobrazit plný text záznamu

Report

A Distributional Analogue to the Successor Representation

Autor: Wiltzer, Harley, Farebrother, Jesse, Gretton, Arthur, Tang, Yunhao, Barreto, André, Dabney, Will, Bellemare, Marc G., Rowland, Mark

This paper contributes a new approach for distributional reinforcement learning which elucidates a clean separation of transition structure and reward in the learning process. Analogous to how the successor representation (SR) describes the expected

Externí odkaz: http://arxiv.org/abs/2402.08530

Zobrazit plný text záznamu

Report

Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model

Autor: Rowland, Mark, Wenliang, Li Kevin, Munos, Rémi, Lyle, Clare, Tang, Yunhao, Dabney, Will

We propose a new algorithm for model-based distributional reinforcement learning (RL), and prove that it is minimax-optimal for approximating return distributions with a generative model (up to logarithmic factors), resolving an open question of Zhan

Externí odkaz: http://arxiv.org/abs/2402.07598

Zobrazit plný text záznamu

Report

Off-policy Distributional Q($\lambda$): Distributional RL without Importance Sampling

Autor: Tang, Yunhao, Rowland, Mark, Munos, Rémi, Pires, Bernardo Ávila, Dabney, Will

We introduce off-policy distributional Q($\lambda$), a new addition to the family of off-policy distributional evaluation algorithms. Off-policy distributional Q($\lambda$) does not apply importance sampling for off-policy learning, which introduces

Externí odkaz: http://arxiv.org/abs/2402.05766

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání