Zobrazeno 1 - 10
of 88
pro vyhledávání: '"Harutyunyan, Anna"'
Modern reinforcement learning has been conditioned by at least three dogmas. The first is the environment spotlight, which refers to our tendency to focus on modeling environments rather than agents. The second is our treatment of learning as finding
Externí odkaz:
http://arxiv.org/abs/2407.10583
Autor:
Lan, Charline Le, Tu, Stephen, Rowland, Mark, Harutyunyan, Anna, Agarwal, Rishabh, Bellemare, Marc G., Dabney, Will
In reinforcement learning (RL), state representations are key to dealing with large or continuous state spaces. While one of the promises of deep learning algorithms is to automatically construct features well-tuned for the task they try to solve, su
Externí odkaz:
http://arxiv.org/abs/2306.10171
Autor:
Tang, Yunhao, Kozuno, Tadashi, Rowland, Mark, Harutyunyan, Anna, Munos, Rémi, Pires, Bernardo Ávila, Valko, Michal
Multi-step learning applies lookahead over multiple time steps and has proved valuable in policy evaluation settings. However, in the optimal control case, the impact of multi-step learning has been relatively limited despite a number of prior effort
Externí odkaz:
http://arxiv.org/abs/2305.18501
Autor:
Rowland, Mark, Munos, Rémi, Azar, Mohammad Gheshlaghi, Tang, Yunhao, Ostrovski, Georg, Harutyunyan, Anna, Tuyls, Karl, Bellemare, Marc G., Dabney, Will
We analyse quantile temporal-difference learning (QTD), a distributional reinforcement learning algorithm that has proven to be a key component in several successful large-scale applications of reinforcement learning. Despite these empirical successe
Externí odkaz:
http://arxiv.org/abs/2301.04462
Autor:
Abel, David, Dabney, Will, Harutyunyan, Anna, Ho, Mark K., Littman, Michael L., Precup, Doina, Singh, Satinder
Reward is the driving force for reinforcement-learning agents. This paper is dedicated to understanding the expressivity of reward as a way to capture tasks that we would want an agent to perform. We frame this study around three new abstract notions
Externí odkaz:
http://arxiv.org/abs/2111.00876
Autor:
Mesnard, Thomas, Weber, Théophane, Viola, Fabio, Thakoor, Shantanu, Saade, Alaa, Harutyunyan, Anna, Dabney, Will, Stepleton, Tom, Heess, Nicolas, Guez, Arthur, Moulines, Éric, Hutter, Marcus, Buesing, Lars, Munos, Rémi
Credit assignment in reinforcement learning is the problem of measuring an action's influence on future rewards. In particular, this requires separating skill from luck, i.e. disentangling the effect of an action on rewards from that of external fact
Externí odkaz:
http://arxiv.org/abs/2011.09464
Reinforcement learning is a powerful learning paradigm in which agents can learn to maximize sparse and delayed reward signals. Although RL has had many impressive successes in complex domains, learning can take hours, days, or even years of training
Externí odkaz:
http://arxiv.org/abs/2011.01297
Autor:
Harutyunyan, Anna, Dabney, Will, Mesnard, Thomas, Azar, Mohammad, Piot, Bilal, Heess, Nicolas, van Hasselt, Hado, Wayne, Greg, Singh, Satinder, Precup, Doina, Munos, Remi
We consider the problem of efficient credit assignment in reinforcement learning. In order to efficiently and meaningfully utilize new data, we propose to explicitly assign credit to past decisions based on the likelihood of them having led to the ob
Externí odkaz:
http://arxiv.org/abs/1912.02503
Autor:
Rowland, Mark, Harutyunyan, Anna, van Hasselt, Hado, Borsa, Diana, Schaul, Tom, Munos, Rémi, Dabney, Will
The principal contribution of this paper is a conceptual framework for off-policy reinforcement learning, based on conditional expectations of importance sampling ratios. This framework yields new perspectives and understanding of existing off-policy
Externí odkaz:
http://arxiv.org/abs/1910.07479
In this work, we consider the problem of autonomously discovering behavioral abstractions, or options, for reinforcement learning agents. We propose an algorithm that focuses on the termination condition, as opposed to -- as is common -- the policy.
Externí odkaz:
http://arxiv.org/abs/1902.09996