Zobrazeno 1 - 10
of 121
pro vyhledávání: '"Schaul, Tom"'
Autor:
Hughes, Edward, Dennis, Michael, Parker-Holder, Jack, Behbahani, Feryal, Mavalankar, Aditi, Shi, Yuge, Schaul, Tom, Rocktaschel, Tim
In recent years there has been a tremendous surge in the general capabilities of AI systems, mainly fuelled by training foundation models on internetscale data. Nevertheless, the creation of openended, ever self-improving AI remains elusive. In this
Externí odkaz:
http://arxiv.org/abs/2406.04268
Autor:
Baumli, Kate, Baveja, Satinder, Behbahani, Feryal, Chan, Harris, Comanici, Gheorghe, Flennerhag, Sebastian, Gazeau, Maxime, Holsheimer, Kristian, Horgan, Dan, Laskin, Michael, Lyle, Clare, Masoom, Hussain, McKinney, Kay, Mnih, Volodymyr, Neitz, Alexander, Nikulin, Dmitry, Pardo, Fabio, Parker-Holder, Jack, Quan, John, Rocktäschel, Tim, Sahni, Himanshu, Schaul, Tom, Schroecker, Yannick, Spencer, Stephen, Steigerwald, Richie, Wang, Luyu, Zhang, Lei
Building generalist agents that can accomplish many goals in rich open-ended environments is one of the research frontiers for reinforcement learning. A key limiting factor for building generalist agents with RL has been the need for a large number o
Externí odkaz:
http://arxiv.org/abs/2312.09187
Autor:
Lange, Robert Tjarko, Schaul, Tom, Chen, Yutian, Lu, Chris, Zahavy, Tom, Dalibard, Valentin, Flennerhag, Sebastian
Genetic algorithms constitute a family of black-box optimization algorithms, which take inspiration from the principles of biological evolution. While they provide a general-purpose tool for optimization, their particular instantiations can be heuris
Externí odkaz:
http://arxiv.org/abs/2304.03995
One of the gnarliest challenges in reinforcement learning (RL) is exploration that scales to vast domains, where novelty-, or coverage-seeking behaviour falls short. Goal-directed, purposeful behaviours are able to overcome this, but rely on a good g
Externí odkaz:
http://arxiv.org/abs/2302.04693
Autor:
Lange, Robert Tjarko, Schaul, Tom, Chen, Yutian, Zahavy, Tom, Dallibard, Valentin, Lu, Chris, Singh, Satinder, Flennerhag, Sebastian
Publikováno v:
11th International Conference on Learning Representations, ICLR 2023
Optimizing functions without access to gradients is the remit of black-box methods such as evolution strategies. While highly general, their learning dynamics are often times heuristic and inflexible - exactly the limitations that meta-learning can a
Externí odkaz:
http://arxiv.org/abs/2211.11260
We identify and study the phenomenon of policy churn, that is, the rapid change of the greedy policy in value-based reinforcement learning. Policy churn operates at a surprisingly rapid pace, changing the greedy action in a large fraction of states w
Externí odkaz:
http://arxiv.org/abs/2206.00730
Autor:
Filos, Angelos, Vértes, Eszter, Marinho, Zita, Farquhar, Gregory, Borsa, Diana, Friesen, Abram, Behbahani, Feryal, Schaul, Tom, Barreto, André, Osindero, Simon
Using a model of the environment and a value function, an agent can construct many estimates of a state's value, by unrolling the model for different lengths and bootstrapping with its value function. Our key insight is that one can treat this set of
Externí odkaz:
http://arxiv.org/abs/2112.04153
Exploration remains a central challenge for reinforcement learning (RL). Virtually all existing methods share the feature of a monolithic behaviour policy that changes only gradually (at best). In contrast, the exploratory behaviours of animals and h
Externí odkaz:
http://arxiv.org/abs/2108.11811
Scaling issues are mundane yet irritating for practitioners of reinforcement learning. Error scales vary across domains, tasks, and stages of learning; sometimes by many orders of magnitude. This can be detrimental to learning speed and stability, cr
Externí odkaz:
http://arxiv.org/abs/2105.05347
Many reinforcement learning algorithms use value functions to guide the search for better policies. These methods estimate the value of a single policy while generalizing across many states. The core idea of this paper is to flip this convention and
Externí odkaz:
http://arxiv.org/abs/2002.11833