Zobrazeno 1 - 10
of 1 112
pro vyhledávání: '"Van Hasselt, P A"'
Autor:
Lyle, Clare, Zheng, Zeyu, Khetarpal, Khimya, Martens, James, van Hasselt, Hado, Pascanu, Razvan, Dabney, Will
Normalization layers have recently experienced a renaissance in the deep reinforcement learning and continual learning literature, with several works highlighting diverse benefits such as improving loss landscape conditioning and combatting overestim
Externí odkaz:
http://arxiv.org/abs/2407.01800
Autor:
Lyle, Clare, Zheng, Zeyu, Khetarpal, Khimya, van Hasselt, Hado, Pascanu, Razvan, Martens, James, Dabney, Will
Underpinning the past decades of work on the design, initialization, and optimization of neural networks is a seemingly innocuous assumption: that the network is trained on a \textit{stationary} data distribution. In settings where this assumption is
Externí odkaz:
http://arxiv.org/abs/2402.18762
Autor:
Pignatelli, Eduardo, Ferret, Johan, Geist, Matthieu, Mesnard, Thomas, van Hasselt, Hado, Pietquin, Olivier, Toni, Laura
The Credit Assignment Problem (CAP) refers to the longstanding challenge of Reinforcement Learning (RL) agents to associate actions with their long-term consequences. Solving the CAP is a crucial step towards the successful deployment of RL in the re
Externí odkaz:
http://arxiv.org/abs/2312.01072
Autor:
Abel, David, Barreto, André, Van Roy, Benjamin, Precup, Doina, van Hasselt, Hado, Singh, Satinder
In a standard view of the reinforcement learning problem, an agent's goal is to efficiently identify a policy that maximizes long-term reward. However, this perspective is based on a restricted view of learning as finding a solution, rather than trea
Externí odkaz:
http://arxiv.org/abs/2307.11046
Autor:
Abel, David, Barreto, André, van Hasselt, Hado, Van Roy, Benjamin, Precup, Doina, Singh, Satinder
When has an agent converged? Standard models of the reinforcement learning problem give rise to a straightforward definition of convergence: An agent converges when its behavior or performance in each environment state stops changing. However, as we
Externí odkaz:
http://arxiv.org/abs/2307.11044
How to efficiently explore in reinforcement learning is an open problem. Many exploration algorithms employ the epistemic uncertainty of their own value predictions -- for instance to compute an exploration bonus or upper confidence bound. Unfortunat
Externí odkaz:
http://arxiv.org/abs/2303.04012
To generalize across tasks, an agent should acquire knowledge from past tasks that facilitate adaptation and exploration in future tasks. We focus on the problem of in-context adaptation and exploration, where an agent only relies on context, i.e., h
Externí odkaz:
http://arxiv.org/abs/2302.04250
Autor:
Flennerhag, Sebastian, Zahavy, Tom, O'Donoghue, Brendan, van Hasselt, Hado, György, András, Singh, Satinder
We study the connection between gradient-based meta-learning and convex op-timisation. We observe that gradient descent with momentum is a special case of meta-gradients, and building on recent results in optimisation, we prove convergence rates for
Externí odkaz:
http://arxiv.org/abs/2301.03236
Autor:
Kapturowski, Steven, Campos, Víctor, Jiang, Ray, Rakićević, Nemanja, van Hasselt, Hado, Blundell, Charles, Badia, Adrià Puigdomènech
The task of building general agents that perform well over a wide range of tasks has been an important goal in reinforcement learning since its inception. The problem has been subject of research of a large body of work, with performance frequently m
Externí odkaz:
http://arxiv.org/abs/2209.07550
Autor:
Bhumika Aggarwal, Paul Jones, Alejandro Casas, Mauro Gomes, Siwasak Juthong, Diego Litewka, Bernice Ong-Dela Cruz, Alejandra Ramirez-Venegas, Abdullah Sayiner, James van Hasselt, Chris Compton, Lee Tombs, Stephen Weng, Gur Levy
Publikováno v:
Pulmonary Therapy, Vol 10, Iss 2, Pp 183-192 (2024)
Abstract Introduction Despite the proven benefits of inhaled corticosteroid (ICS)-containing triple therapy for chronic obstructive pulmonary disease (COPD), clinicians limit patient exposure to ICS due to the risk of pneumonia. However, there are mu
Externí odkaz:
https://doaj.org/article/dbe59546ea2a4f15b11bdd14b3d17a33