Zobrazeno 1 - 10
of 408
pro vyhledávání: '"ORTEGA, PEDRO"'
Autor:
Grau-Moya, Jordi, Delétang, Grégoire, Kunesch, Markus, Genewein, Tim, Catt, Elliot, Li, Kevin, Ruoss, Anian, Cundy, Chris, Veness, Joel, Wang, Jane, Hutter, Marcus, Summerfield, Christopher, Legg, Shane, Ortega, Pedro
Meta-training agents with memory has been shown to culminate in Bayes-optimal agents, which casts Bayes-optimality as the implicit solution to a numerical optimization problem rather than an explicit modeling assumption. Bayes-optimal agents are risk
Externí odkaz:
http://arxiv.org/abs/2209.15618
Autor:
Delétang, Grégoire, Ruoss, Anian, Grau-Moya, Jordi, Genewein, Tim, Wenliang, Li Kevin, Catt, Elliot, Cundy, Chris, Hutter, Marcus, Legg, Shane, Veness, Joel, Ortega, Pedro A.
Reliable generalization lies at the heart of safe ML and AI. However, understanding when and how neural networks generalize remains one of the most important unsolved problems in the field. In this work, we conduct an extensive empirical study (20'91
Externí odkaz:
http://arxiv.org/abs/2207.02098
Autor:
Brekelmans, Rob, Genewein, Tim, Grau-Moya, Jordi, Delétang, Grégoire, Kunesch, Markus, Legg, Shane, Ortega, Pedro
Publikováno v:
TMLR (2022) https://openreview.net/forum?id=berNQMTYWZ
Policy regularization methods such as maximum entropy regularization are widely used in reinforcement learning to improve the robustness of a learned policy. In this paper, we show how this robustness arises from hedging against worst-case perturbati
Externí odkaz:
http://arxiv.org/abs/2203.12592
Autor:
Delétang, Grégoire, Grau-Moya, Jordi, Kunesch, Markus, Genewein, Tim, Brekelmans, Rob, Legg, Shane, Ortega, Pedro A.
We extend temporal-difference (TD) learning in order to obtain risk-sensitive, model-free reinforcement learning algorithms. This extension can be regarded as modification of the Rescorla-Wagner rule, where the (sigmoidal) stimulus is taken to be eit
Externí odkaz:
http://arxiv.org/abs/2111.02907
Autor:
Ortega, Pedro A., Kunesch, Markus, Delétang, Grégoire, Genewein, Tim, Grau-Moya, Jordi, Veness, Joel, Buchli, Jonas, Degrave, Jonas, Piot, Bilal, Perolat, Julien, Everitt, Tom, Tallec, Corentin, Parisotto, Emilio, Erez, Tom, Chen, Yutian, Reed, Scott, Hutter, Marcus, de Freitas, Nando, Legg, Shane
The recent phenomenal success of language models has reinvigorated machine learning research, and large sequence models such as transformers are being applied to a variety of domains. One important problem class that has remained relatively elusive h
Externí odkaz:
http://arxiv.org/abs/2110.10819
Autor:
Pérez, Juan Antonio, Gonçalves, Gil Rito, Morillo Barragan, Juan Ramón, Fuentes Ortega, Pedro, Caracol Palomo, Antonio Antonio M.
Publikováno v:
In Heliyon 15 May 2024 10(9)
Let $\omega$ and $\nu$ be radial weights on the unit disc of the complex plane such that $\omega$ admits the doubling property $\sup_{0\le r<1}\frac{\int_r^1 \omega(s)\,ds}{\int_{\frac{1+r}{2}}^1 \omega(s)\,ds}<\infty$. Consider the one weight inequa
Externí odkaz:
http://arxiv.org/abs/2105.08029
Autor:
Déletang, Grégoire, Grau-Moya, Jordi, Martic, Miljan, Genewein, Tim, McGrath, Tom, Mikulik, Vladimir, Kunesch, Markus, Legg, Shane, Ortega, Pedro A.
As machine learning systems become more powerful they also become increasingly unpredictable and opaque. Yet, finding human-understandable explanations of how they work is essential for their safe deployment. This technical report illustrates a metho
Externí odkaz:
http://arxiv.org/abs/2103.03938
We present a framework for analysing agent incentives using causal influence diagrams. We establish that a well-known criterion for value of information is complete. We propose a new graphical criterion for value of control, establishing its soundnes
Externí odkaz:
http://arxiv.org/abs/2102.01685
Autor:
Desimone, Paula Mariela, Zonta, Giulia, Giulietti, Giuliana, Ortega, Pedro Paulo, Aldao, Celso Manuel, Simões, Alexandre Zirpoli, Moura, Francisco, Ponce, Miguel Adolfo, Foschini, Cesar Renato
Publikováno v:
In Materials Science & Engineering B January 2024 299