Zobrazeno 1 - 10
of 3 524
pro vyhledávání: '"Bellemare A"'
When decisions are made at high frequency, traditional reinforcement learning (RL) methods struggle to accurately estimate action values. In turn, their performance is inconsistent and often poor. Whether the performance of distributional RL (DRL) ag
Externí odkaz:
http://arxiv.org/abs/2410.11022
Autor:
Bellemare, Sylvain
A niche corner of the Web3 world is increasingly making use of hardware-based Trusted Execution Environments (TEEs) to build decentralized infrastructure. One of the motivations to use TEEs is to go beyond the current performance limitations of crypt
Externí odkaz:
http://arxiv.org/abs/2410.03183
The rise of large language models (LLMs) has prompted increasing interest in their use as in-context learning agents. At the core of agentic behavior is the capacity for exploration, or the ability to actively gather information about the environment
Externí odkaz:
http://arxiv.org/abs/2406.00244
Autor:
Bellemare-Pepin, Antoine, Lespinasse, François, Thölke, Philipp, Harel, Yann, Mathewson, Kory, Olson, Jay A., Bengio, Yoshua, Jerbi, Karim
The recent surge in the capabilities of Large Language Models (LLMs) has led to claims that they are approaching a level of creativity akin to human capabilities. This idea has sparked a blend of excitement and apprehension. However, a critical piece
Externí odkaz:
http://arxiv.org/abs/2405.13012
Autor:
Wiltzer, Harley, Farebrother, Jesse, Gretton, Arthur, Tang, Yunhao, Barreto, André, Dabney, Will, Bellemare, Marc G., Rowland, Mark
This paper contributes a new approach for distributional reinforcement learning which elucidates a clean separation of transition structure and reward in the learning process. Analogous to how the successor representation (SR) describes the expected
Externí odkaz:
http://arxiv.org/abs/2402.08530
In value-based deep reinforcement learning with replay memories, the batch size parameter specifies how many transitions to sample for each gradient update. Although critical to the learning process, this value is typically not adjusted when proposin
Externí odkaz:
http://arxiv.org/abs/2310.03882
Deep reinforcement learning agents for continuous control are known to exhibit significant instability in their performance over time. In this work, we provide a fresh perspective on these behaviors by studying the return landscape: the mapping betwe
Externí odkaz:
http://arxiv.org/abs/2309.14597
Autor:
Lan, Charline Le, Tu, Stephen, Rowland, Mark, Harutyunyan, Anna, Agarwal, Rishabh, Bellemare, Marc G., Dabney, Will
In reinforcement learning (RL), state representations are key to dealing with large or continuous state spaces. While one of the promises of deep learning algorithms is to automatically construct features well-tuned for the task they try to solve, su
Externí odkaz:
http://arxiv.org/abs/2306.10171
Autor:
Schwarzer, Max, Obando-Ceron, Johan, Courville, Aaron, Bellemare, Marc, Agarwal, Rishabh, Castro, Pablo Samuel
We introduce a value-based RL agent, which we call BBF, that achieves super-human performance in the Atari 100K benchmark. BBF relies on scaling the neural networks used for value estimation, as well as a number of other design choices that enable th
Externí odkaz:
http://arxiv.org/abs/2305.19452
We study the problem of temporal-difference-based policy evaluation in reinforcement learning. In particular, we analyse the use of a distributional reinforcement learning algorithm, quantile temporal-difference learning (QTD), for this task. We reac
Externí odkaz:
http://arxiv.org/abs/2305.18388