Zobrazeno 1 - 10
of 187
pro vyhledávání: '"Krishnamurthy, Akshay"'
Real-world applications of reinforcement learning often involve environments where agents operate on complex, high-dimensional observations, but the underlying (''latent'') dynamics are comparatively simple. However, outside of restrictive settings s
Externí odkaz:
http://arxiv.org/abs/2410.17904
Autor:
Huang, Audrey, Zhan, Wenhao, Xie, Tengyang, Lee, Jason D., Sun, Wen, Krishnamurthy, Akshay, Foster, Dylan J.
Language model alignment methods, such as reinforcement learning from human feedback (RLHF), have led to impressive advances in language model capabilities, but existing techniques are limited by a widely observed phenomenon known as overoptimization
Externí odkaz:
http://arxiv.org/abs/2407.13399
We study computationally and statistically efficient Reinforcement Learning algorithms for the linear Bellman Complete setting, a setting that uses linear function approximation to capture value functions and unifies existing models like linear Marko
Externí odkaz:
http://arxiv.org/abs/2406.11810
Autor:
Xie, Tengyang, Foster, Dylan J., Krishnamurthy, Akshay, Rosset, Corby, Awadallah, Ahmed, Rakhlin, Alexander
Reinforcement learning from human feedback (RLHF) has emerged as a central tool for language model alignment. We consider online exploration in RLHF, which exploits interactive access to human or AI feedback by deliberately encouraging the model to p
Externí odkaz:
http://arxiv.org/abs/2405.21046
Sample-efficiency and reliability remain major bottlenecks toward wide adoption of reinforcement learning algorithms in continuous settings with high-dimensional perceptual inputs. Toward addressing these challenges, we introduce a new theoretical fr
Externí odkaz:
http://arxiv.org/abs/2405.19269
We investigate the extent to which contemporary Large Language Models (LLMs) can engage in exploration, a core capability in reinforcement learning and decision making. We focus on native performance of existing LLMs, without training interventions.
Externí odkaz:
http://arxiv.org/abs/2403.15371
Exploration is a major challenge in reinforcement learning, especially for high-dimensional domains that require function approximation. We propose exploration objectives -- policy optimization objectives that enable downstream maximization of any re
Externí odkaz:
http://arxiv.org/abs/2403.06571
A pervasive phenomenon in machine learning applications is distribution shift, where training and deployment conditions for a machine learning model differ. As distribution shift typically results in a degradation in performance, much attention has b
Externí odkaz:
http://arxiv.org/abs/2401.12216
This work studies training instabilities of behavior cloning with deep neural networks. We observe that minibatch SGD updates to the policy network during training result in sharp oscillations in long-horizon rewards, despite negligibly affecting the
Externí odkaz:
http://arxiv.org/abs/2310.11428
We consider offline policy optimization (OPO) in contextual bandits, where one is given a fixed dataset of logged interactions. While pessimistic regularizers are typically used to mitigate distribution shift, prior implementations thereof are either
Externí odkaz:
http://arxiv.org/abs/2306.07923