Zobrazeno 1 - 10
of 110
pro vyhledávání: '"Foster, Dylan J."'
Real-world applications of reinforcement learning often involve environments where agents operate on complex, high-dimensional observations, but the underlying (''latent'') dynamics are comparatively simple. However, outside of restrictive settings s
Externí odkaz:
http://arxiv.org/abs/2410.17904
In this paper, we develop a unified framework for lower bound methods in statistical estimation and interactive decision making. Classical lower bound techniques -- such as Fano's inequality, Le Cam's method, and Assouad's lemma -- have been central
Externí odkaz:
http://arxiv.org/abs/2410.05117
Imitation learning (IL) aims to mimic the behavior of an expert in a sequential decision making task by learning from demonstrations, and has been widely applied to robotics, autonomous driving, and autoregressive text generation. The simplest approa
Externí odkaz:
http://arxiv.org/abs/2407.15007
Autor:
Huang, Audrey, Zhan, Wenhao, Xie, Tengyang, Lee, Jason D., Sun, Wen, Krishnamurthy, Akshay, Foster, Dylan J.
Language model alignment methods, such as reinforcement learning from human feedback (RLHF), have led to impressive advances in language model capabilities, but existing techniques are limited by a widely observed phenomenon known as overoptimization
Externí odkaz:
http://arxiv.org/abs/2407.13399
Autor:
Xie, Tengyang, Foster, Dylan J., Krishnamurthy, Akshay, Rosset, Corby, Awadallah, Ahmed, Rakhlin, Alexander
Reinforcement learning from human feedback (RLHF) has emerged as a central tool for language model alignment. We consider online exploration in RLHF, which exploits interactive access to human or AI feedback by deliberately encouraging the model to p
Externí odkaz:
http://arxiv.org/abs/2405.21046
Sample-efficiency and reliability remain major bottlenecks toward wide adoption of reinforcement learning algorithms in continuous settings with high-dimensional perceptual inputs. Toward addressing these challenges, we introduce a new theoretical fr
Externí odkaz:
http://arxiv.org/abs/2405.19269
Simulators are a pervasive tool in reinforcement learning, but most existing algorithms cannot efficiently exploit simulator access -- particularly in high-dimensional domains that require general function approximation. We explore the power of simul
Externí odkaz:
http://arxiv.org/abs/2404.15417
$ $The classical theory of statistical estimation aims to estimate a parameter of interest under data generated from a fixed design ("offline estimation"), while the contemporary theory of online learning provides algorithms for estimation under adap
Externí odkaz:
http://arxiv.org/abs/2404.10122
We investigate the extent to which contemporary Large Language Models (LLMs) can engage in exploration, a core capability in reinforcement learning and decision making. We focus on native performance of existing LLMs, without training interventions.
Externí odkaz:
http://arxiv.org/abs/2403.15371
Exploration is a major challenge in reinforcement learning, especially for high-dimensional domains that require function approximation. We propose exploration objectives -- policy optimization objectives that enable downstream maximization of any re
Externí odkaz:
http://arxiv.org/abs/2403.06571