Výsledky vyhledávání

Report

Boundless Socratic Learning with Language Games

Autor: Schaul, Tom

An agent trained within a closed system can master any desired capability, as long as the following three conditions hold: (a) it receives sufficiently informative and aligned feedback, (b) its coverage of experience/data is broad enough, and (c) it

Externí odkaz: http://arxiv.org/abs/2411.16905

Zobrazit plný text záznamu

Report

Open-Endedness is Essential for Artificial Superhuman Intelligence

Autor: Hughes, Edward, Dennis, Michael, Parker-Holder, Jack, Behbahani, Feryal, Mavalankar, Aditi, Shi, Yuge, Schaul, Tom, Rocktaschel, Tim

In recent years there has been a tremendous surge in the general capabilities of AI systems, mainly fuelled by training foundation models on internetscale data. Nevertheless, the creation of openended, ever self-improving AI remains elusive. In this

Externí odkaz: http://arxiv.org/abs/2406.04268

Zobrazit plný text záznamu

Report

Vision-Language Models as a Source of Rewards

Building generalist agents that can accomplish many goals in rich open-ended environments is one of the research frontiers for reinforcement learning. A key limiting factor for building generalist agents with RL has been the need for a large number o

Externí odkaz: http://arxiv.org/abs/2312.09187

Zobrazit plný text záznamu

Report

Discovering Attention-Based Genetic Algorithms via Meta-Black-Box Optimization

Autor: Lange, Robert Tjarko, Schaul, Tom, Chen, Yutian, Lu, Chris, Zahavy, Tom, Dalibard, Valentin, Flennerhag, Sebastian

Genetic algorithms constitute a family of black-box optimization algorithms, which take inspiration from the principles of biological evolution. While they provide a general-purpose tool for optimization, their particular instantiations can be heuris

Externí odkaz: http://arxiv.org/abs/2304.03995

Zobrazit plný text záznamu

Report

Scaling Goal-based Exploration via Pruning Proto-goals

Autor: Bagaria, Akhil, Jiang, Ray, Kumar, Ramana, Schaul, Tom

One of the gnarliest challenges in reinforcement learning (RL) is exploration that scales to vast domains, where novelty-, or coverage-seeking behaviour falls short. Goal-directed, purposeful behaviours are able to overcome this, but rely on a good g

Externí odkaz: http://arxiv.org/abs/2302.04693

Zobrazit plný text záznamu

Akademický článek

Exploring Parameter Space in Reinforcement Learning

Autor: Rückstieß Thomas, Sehnke Frank, Schaul Tom, Wierstra Daan, Sun Yi, Schmidhuber Jürgen

Publikováno v: Paladyn, Vol 1, Iss 1, Pp 14-24 (2010)

This paper discusses parameter-based exploration methods for reinforcement learning. Parameter-based methods perturb parameters of a general function approximator directly, rather than adding noise to the resulting actions. Parameter-based exploratio

Externí odkaz: https://doaj.org/article/1d7d7f4ee4c14115b57878db43f50fab

Zobrazit plný text záznamu

Report

Discovering Evolution Strategies via Meta-Black-Box Optimization

Autor: Lange, Robert Tjarko, Schaul, Tom, Chen, Yutian, Zahavy, Tom, Dallibard, Valentin, Lu, Chris, Singh, Satinder, Flennerhag, Sebastian

Publikováno v: 11th International Conference on Learning Representations, ICLR 2023

Optimizing functions without access to gradients is the remit of black-box methods such as evolution strategies. While highly general, their learning dynamics are often times heuristic and inflexible - exactly the limitations that meta-learning can a

Externí odkaz: http://arxiv.org/abs/2211.11260

Zobrazit plný text záznamu

Report

The Phenomenon of Policy Churn

Autor: Schaul, Tom, Barreto, André, Quan, John, Ostrovski, Georg

We identify and study the phenomenon of policy churn, that is, the rapid change of the greedy policy in value-based reinforcement learning. Policy churn operates at a surprisingly rapid pace, changing the greedy action in a large fraction of states w

Externí odkaz: http://arxiv.org/abs/2206.00730

Zobrazit plný text záznamu

Report

Model-Value Inconsistency as a Signal for Epistemic Uncertainty

Autor: Filos, Angelos, Vértes, Eszter, Marinho, Zita, Farquhar, Gregory, Borsa, Diana, Friesen, Abram, Behbahani, Feryal, Schaul, Tom, Barreto, André, Osindero, Simon

Using a model of the environment and a value function, an agent can construct many estimates of a state's value, by unrolling the model for different lengths and bootstrapping with its value function. Our key insight is that one can treat this set of

Externí odkaz: http://arxiv.org/abs/2112.04153

Zobrazit plný text záznamu

Report

When should agents explore?

Autor: Pîslar, Miruna, Szepesvari, David, Ostrovski, Georg, Borsa, Diana, Schaul, Tom

Exploration remains a central challenge for reinforcement learning (RL). Virtually all existing methods share the feature of a monolithic behaviour policy that changes only gradually (at best). In contrast, the exploratory behaviours of animals and h

Externí odkaz: http://arxiv.org/abs/2108.11811

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání