Výsledky vyhledávání

Report

Action Gaps and Advantages in Continuous-Time Distributional Reinforcement Learning

Autor: Wiltzer, Harley, Bellemare, Marc G., Meger, David, Shafto, Patrick, Jhaveri, Yash

When decisions are made at high frequency, traditional reinforcement learning (RL) methods struggle to accurately estimate action values. In turn, their performance is inconsistent and often poor. Whether the performance of distributional RL (DRL) ag

Externí odkaz: http://arxiv.org/abs/2410.11022

Zobrazit plný text záznamu

Report

Research Directions for Verifiable Crypto-Physically Secure TEEs

Autor: Bellemare, Sylvain

A niche corner of the Web3 world is increasingly making use of hardware-based Trusted Execution Environments (TEEs) to build decentralized infrastructure. One of the motivations to use TEEs is to go beyond the current performance limitations of crypt

Externí odkaz: http://arxiv.org/abs/2410.03183

Zobrazit plný text záznamu

Report

Controlling Large Language Model Agents with Entropic Activation Steering

Autor: Rahn, Nate, D'Oro, Pierluca, Bellemare, Marc G.

The rise of large language models (LLMs) has prompted increasing interest in their use as in-context learning agents. At the core of agentic behavior is the capacity for exploration, or the ability to actively gather information about the environment

Externí odkaz: http://arxiv.org/abs/2406.00244

Zobrazit plný text záznamu

Report

Divergent Creativity in Humans and Large Language Models

Autor: Bellemare-Pepin, Antoine, Lespinasse, François, Thölke, Philipp, Harel, Yann, Mathewson, Kory, Olson, Jay A., Bengio, Yoshua, Jerbi, Karim

The recent surge in the capabilities of Large Language Models (LLMs) has led to claims that they are approaching a level of creativity akin to human capabilities. This idea has sparked a blend of excitement and apprehension. However, a critical piece

Externí odkaz: http://arxiv.org/abs/2405.13012

Zobrazit plný text záznamu

Report

A Distributional Analogue to the Successor Representation

Autor: Wiltzer, Harley, Farebrother, Jesse, Gretton, Arthur, Tang, Yunhao, Barreto, André, Dabney, Will, Bellemare, Marc G., Rowland, Mark

This paper contributes a new approach for distributional reinforcement learning which elucidates a clean separation of transition structure and reward in the learning process. Analogous to how the successor representation (SR) describes the expected

Externí odkaz: http://arxiv.org/abs/2402.08530

Zobrazit plný text záznamu

Report

Small batch deep reinforcement learning

Autor: Obando-Ceron, Johan, Bellemare, Marc G., Castro, Pablo Samuel

In value-based deep reinforcement learning with replay memories, the batch size parameter specifies how many transitions to sample for each gradient update. Although critical to the learning process, this value is typically not adjusted when proposin

Externí odkaz: http://arxiv.org/abs/2310.03882

Zobrazit plný text záznamu

Report

Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control

Autor: Rahn, Nate, D'Oro, Pierluca, Wiltzer, Harley, Bacon, Pierre-Luc, Bellemare, Marc G.

Deep reinforcement learning agents for continuous control are known to exhibit significant instability in their performance over time. In this work, we provide a fresh perspective on these behaviors by studying the return landscape: the mapping betwe

Externí odkaz: http://arxiv.org/abs/2309.14597

Zobrazit plný text záznamu

Report

Bootstrapped Representations in Reinforcement Learning

Autor: Lan, Charline Le, Tu, Stephen, Rowland, Mark, Harutyunyan, Anna, Agarwal, Rishabh, Bellemare, Marc G., Dabney, Will

In reinforcement learning (RL), state representations are key to dealing with large or continuous state spaces. While one of the promises of deep learning algorithms is to automatically construct features well-tuned for the task they try to solve, su

Externí odkaz: http://arxiv.org/abs/2306.10171

Zobrazit plný text záznamu

Report

Bigger, Better, Faster: Human-level Atari with human-level efficiency

Autor: Schwarzer, Max, Obando-Ceron, Johan, Courville, Aaron, Bellemare, Marc, Agarwal, Rishabh, Castro, Pablo Samuel

We introduce a value-based RL agent, which we call BBF, that achieves super-human performance in the Atari 100K benchmark. BBF relies on scaling the neural networks used for value estimation, as well as a number of other design choices that enable th

Externí odkaz: http://arxiv.org/abs/2305.19452

Zobrazit plný text záznamu

Report

The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation

Autor: Rowland, Mark, Tang, Yunhao, Lyle, Clare, Munos, Rémi, Bellemare, Marc G., Dabney, Will

We study the problem of temporal-difference-based policy evaluation in reinforcement learning. In particular, we analyse the use of a distributional reinforcement learning algorithm, quantile temporal-difference learning (QTD), for this task. We reac

Externí odkaz: http://arxiv.org/abs/2305.18388

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání