Zobrazeno 1 - 10
of 56
pro vyhledávání: '"Genewein, Tim"'
Autor:
Ruoss, Anian, Delétang, Grégoire, Medapati, Sourabh, Grau-Moya, Jordi, Wenliang, Li Kevin, Catt, Elliot, Reid, John, Genewein, Tim
The recent breakthrough successes in machine learning are mainly attributed to scale: namely large-scale attention-based architectures and datasets of unprecedented scale. This paper investigates the impact of training at scale for chess. Unlike trad
Externí odkaz:
http://arxiv.org/abs/2402.04494
Autor:
Grau-Moya, Jordi, Genewein, Tim, Hutter, Marcus, Orseau, Laurent, Delétang, Grégoire, Catt, Elliot, Ruoss, Anian, Wenliang, Li Kevin, Mattern, Christopher, Aitchison, Matthew, Veness, Joel
Meta-learning has emerged as a powerful approach to train neural networks to learn new tasks quickly from limited data. Broad exposure to different tasks leads to versatile representations enabling general problem solving. But, what are the limits of
Externí odkaz:
http://arxiv.org/abs/2401.14953
Autor:
Delétang, Grégoire, Ruoss, Anian, Duquenne, Paul-Ambroise, Catt, Elliot, Genewein, Tim, Mattern, Christopher, Grau-Moya, Jordi, Wenliang, Li Kevin, Aitchison, Matthew, Orseau, Laurent, Hutter, Marcus, Veness, Joel
It has long been established that predictive models can be transformed into lossless compressors and vice versa. Incidentally, in recent years, the machine learning community has focused on training increasingly large and powerful self-supervised (la
Externí odkaz:
http://arxiv.org/abs/2309.10668
Autor:
Ruoss, Anian, Delétang, Grégoire, Genewein, Tim, Grau-Moya, Jordi, Csordás, Róbert, Bennani, Mehdi, Legg, Shane, Veness, Joel
Transformers have impressive generalization capabilities on tasks with a fixed context length. However, they fail to generalize to sequences of arbitrary length, even for seemingly simple tasks such as duplicating a string. Moreover, simply training
Externí odkaz:
http://arxiv.org/abs/2305.16843
Autor:
Genewein, Tim, Delétang, Grégoire, Ruoss, Anian, Wenliang, Li Kevin, Catt, Elliot, Dutordoir, Vincent, Grau-Moya, Jordi, Orseau, Laurent, Hutter, Marcus, Veness, Joel
Memory-based meta-learning is a technique for approximating Bayes-optimal predictors. Under fairly general conditions, minimizing sequential prediction error, measured by the log loss, leads to implicit meta-learning. The goal of this work is to inve
Externí odkaz:
http://arxiv.org/abs/2302.03067
Autor:
Grau-Moya, Jordi, Delétang, Grégoire, Kunesch, Markus, Genewein, Tim, Catt, Elliot, Li, Kevin, Ruoss, Anian, Cundy, Chris, Veness, Joel, Wang, Jane, Hutter, Marcus, Summerfield, Christopher, Legg, Shane, Ortega, Pedro
Meta-training agents with memory has been shown to culminate in Bayes-optimal agents, which casts Bayes-optimality as the implicit solution to a numerical optimization problem rather than an explicit modeling assumption. Bayes-optimal agents are risk
Externí odkaz:
http://arxiv.org/abs/2209.15618
Autor:
Delétang, Grégoire, Ruoss, Anian, Grau-Moya, Jordi, Genewein, Tim, Wenliang, Li Kevin, Catt, Elliot, Cundy, Chris, Hutter, Marcus, Legg, Shane, Veness, Joel, Ortega, Pedro A.
Reliable generalization lies at the heart of safe ML and AI. However, understanding when and how neural networks generalize remains one of the most important unsolved problems in the field. In this work, we conduct an extensive empirical study (20'91
Externí odkaz:
http://arxiv.org/abs/2207.02098
Autor:
Brekelmans, Rob, Genewein, Tim, Grau-Moya, Jordi, Delétang, Grégoire, Kunesch, Markus, Legg, Shane, Ortega, Pedro
Publikováno v:
TMLR (2022) https://openreview.net/forum?id=berNQMTYWZ
Policy regularization methods such as maximum entropy regularization are widely used in reinforcement learning to improve the robustness of a learned policy. In this paper, we show how this robustness arises from hedging against worst-case perturbati
Externí odkaz:
http://arxiv.org/abs/2203.12592
Autor:
Delétang, Grégoire, Grau-Moya, Jordi, Kunesch, Markus, Genewein, Tim, Brekelmans, Rob, Legg, Shane, Ortega, Pedro A.
We extend temporal-difference (TD) learning in order to obtain risk-sensitive, model-free reinforcement learning algorithms. This extension can be regarded as modification of the Rescorla-Wagner rule, where the (sigmoidal) stimulus is taken to be eit
Externí odkaz:
http://arxiv.org/abs/2111.02907
Autor:
Ortega, Pedro A., Kunesch, Markus, Delétang, Grégoire, Genewein, Tim, Grau-Moya, Jordi, Veness, Joel, Buchli, Jonas, Degrave, Jonas, Piot, Bilal, Perolat, Julien, Everitt, Tom, Tallec, Corentin, Parisotto, Emilio, Erez, Tom, Chen, Yutian, Reed, Scott, Hutter, Marcus, de Freitas, Nando, Legg, Shane
The recent phenomenal success of language models has reinvigorated machine learning research, and large sequence models such as transformers are being applied to a variety of domains. One important problem class that has remained relatively elusive h
Externí odkaz:
http://arxiv.org/abs/2110.10819