Zobrazeno 1 - 10
of 288
pro vyhledávání: '"ABEL, DAVID"'
Black swan events are statistically rare occurrences that carry extremely high risks. A typical view of defining black swan events is heavily assumed to originate from an unpredictable time-varying environments; however, the community lacks a compreh
Externí odkaz:
http://arxiv.org/abs/2407.18422
Modern reinforcement learning has been conditioned by at least three dogmas. The first is the environment spotlight, which refers to our tendency to focus on modeling environments rather than agents. The second is our treatment of learning as finding
Externí odkaz:
http://arxiv.org/abs/2407.10583
Humans use social context to specify preferences over behaviors, i.e. their reward functions. Yet, algorithms for inferring reward models from preference data do not take this social learning view into account. Inspired by pragmatic human communicati
Externí odkaz:
http://arxiv.org/abs/2405.14769
Autor:
Abel, David, Barreto, André, Van Roy, Benjamin, Precup, Doina, van Hasselt, Hado, Singh, Satinder
In a standard view of the reinforcement learning problem, an agent's goal is to efficiently identify a policy that maximizes long-term reward. However, this perspective is based on a restricted view of learning as finding a solution, rather than trea
Externí odkaz:
http://arxiv.org/abs/2307.11046
Autor:
Abel, David, Barreto, André, van Hasselt, Hado, Van Roy, Benjamin, Precup, Doina, Singh, Satinder
When has an agent converged? Standard models of the reinforcement learning problem give rise to a straightforward definition of convergence: An agent converges when its behavior or performance in each environment state stops changing. However, as we
Externí odkaz:
http://arxiv.org/abs/2307.11044
The reward hypothesis posits that, "all of what we mean by goals and purposes can be well thought of as maximization of the expected value of the cumulative sum of a received scalar signal (reward)." We aim to fully settle this hypothesis. This will
Externí odkaz:
http://arxiv.org/abs/2212.10420
Autor:
Luketina, Jelena, Flennerhag, Sebastian, Schroecker, Yannick, Abel, David, Zahavy, Tom, Singh, Satinder
Meta-gradient methods (Xu et al., 2018; Zahavy et al., 2020) offer a promising solution to the problem of hyperparameter selection and adaptation in non-stationary reinforcement learning problems. However, the properties of meta-gradients in such env
Externí odkaz:
http://arxiv.org/abs/2209.06159
Autor:
Abel, David
Publikováno v:
Doctoral Dissertation, Department of Computer Science, Brown University, 2020
Reinforcement learning defines the problem facing agents that learn to make good decisions through action and observation alone. To be effective problem solvers, such agents must efficiently explore vast worlds, assign credit from delayed feedback, a
Externí odkaz:
http://arxiv.org/abs/2203.00397
Autor:
Abel, David Lynn
Publikováno v:
In Studies in History and Philosophy of Science October 2024 107:54-63
Autor:
Abel, David, Dabney, Will, Harutyunyan, Anna, Ho, Mark K., Littman, Michael L., Precup, Doina, Singh, Satinder
Reward is the driving force for reinforcement-learning agents. This paper is dedicated to understanding the expressivity of reward as a way to capture tasks that we would want an agent to perform. We frame this study around three new abstract notions
Externí odkaz:
http://arxiv.org/abs/2111.00876