Zobrazeno 1 - 10
of 117
pro vyhledávání: '"Bellemare, Marc G."'
When decisions are made at high frequency, traditional reinforcement learning (RL) methods struggle to accurately estimate action values. In turn, their performance is inconsistent and often poor. Whether the performance of distributional RL (DRL) ag
Externí odkaz:
http://arxiv.org/abs/2410.11022
The rise of large language models (LLMs) has prompted increasing interest in their use as in-context learning agents. At the core of agentic behavior is the capacity for exploration, or the ability to actively gather information about the environment
Externí odkaz:
http://arxiv.org/abs/2406.00244
Autor:
Wiltzer, Harley, Farebrother, Jesse, Gretton, Arthur, Tang, Yunhao, Barreto, André, Dabney, Will, Bellemare, Marc G., Rowland, Mark
This paper contributes a new approach for distributional reinforcement learning which elucidates a clean separation of transition structure and reward in the learning process. Analogous to how the successor representation (SR) describes the expected
Externí odkaz:
http://arxiv.org/abs/2402.08530
In value-based deep reinforcement learning with replay memories, the batch size parameter specifies how many transitions to sample for each gradient update. Although critical to the learning process, this value is typically not adjusted when proposin
Externí odkaz:
http://arxiv.org/abs/2310.03882
Deep reinforcement learning agents for continuous control are known to exhibit significant instability in their performance over time. In this work, we provide a fresh perspective on these behaviors by studying the return landscape: the mapping betwe
Externí odkaz:
http://arxiv.org/abs/2309.14597
Autor:
Lan, Charline Le, Tu, Stephen, Rowland, Mark, Harutyunyan, Anna, Agarwal, Rishabh, Bellemare, Marc G., Dabney, Will
In reinforcement learning (RL), state representations are key to dealing with large or continuous state spaces. While one of the promises of deep learning algorithms is to automatically construct features well-tuned for the task they try to solve, su
Externí odkaz:
http://arxiv.org/abs/2306.10171
We study the problem of temporal-difference-based policy evaluation in reinforcement learning. In particular, we analyse the use of a distributional reinforcement learning algorithm, quantile temporal-difference learning (QTD), for this task. We reac
Externí odkaz:
http://arxiv.org/abs/2305.18388
Autor:
Farebrother, Jesse, Greaves, Joshua, Agarwal, Rishabh, Lan, Charline Le, Goroshin, Ross, Castro, Pablo Samuel, Bellemare, Marc G.
Auxiliary tasks improve the representations learned by deep reinforcement learning agents. Analytically, their effect is reasonably well understood; in practice, however, their primary use remains in support of a main learning objective, rather than
Externí odkaz:
http://arxiv.org/abs/2304.12567
Autor:
Rowland, Mark, Munos, Rémi, Azar, Mohammad Gheshlaghi, Tang, Yunhao, Ostrovski, Georg, Harutyunyan, Anna, Tuyls, Karl, Bellemare, Marc G., Dabney, Will
We analyse quantile temporal-difference learning (QTD), a distributional reinforcement learning algorithm that has proven to be a key component in several successful large-scale applications of reinforcement learning. Despite these empirical successe
Externí odkaz:
http://arxiv.org/abs/2301.04462
Autor:
Lan, Charline Le, Greaves, Joshua, Farebrother, Jesse, Rowland, Mark, Pedregosa, Fabian, Agarwal, Rishabh, Bellemare, Marc G.
Many machine learning problems encode their data as a matrix with a possibly very large number of rows and columns. In several applications like neuroscience, image compression or deep reinforcement learning, the principal subspace of such a matrix p
Externí odkaz:
http://arxiv.org/abs/2212.04025