Zobrazeno 1 - 10
of 2 681
pro vyhledávání: '"P. A. Brantley"'
Adversarial Imitation Learning is traditionally framed as a two-player zero-sum game between a learner and an adversarially chosen cost function, and can therefore be thought of as the sequential generalization of a Generative Adversarial Network (GA
Externí odkaz:
http://arxiv.org/abs/2410.13855
Large Language Models (LLMs) can learn new tasks through in-context supervised learning (i.e., ICL). This work studies if this ability extends to in-context reinforcement learning (ICRL), where models are not given gold labels in context, but only th
Externí odkaz:
http://arxiv.org/abs/2410.05362
Autor:
Gao, Zhaolin, Zhan, Wenhao, Chang, Jonathan D., Swamy, Gokul, Brantley, Kianté, Lee, Jason D., Sun, Wen
Large Language Models (LLMs) have achieved remarkable success at tasks like summarization that involve a single turn of interaction. However, they can still struggle with multi-turn tasks like dialogue that require long-term planning. Previous works
Externí odkaz:
http://arxiv.org/abs/2410.04612
Autor:
Gao, Zhaolin, Chang, Jonathan D., Zhan, Wenhao, Oertell, Owen, Swamy, Gokul, Brantley, Kianté, Joachims, Thorsten, Bagnell, J. Andrew, Lee, Jason D., Sun, Wen
While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications, including the fine-tuning of generative models. Unfortunately, PPO
Externí odkaz:
http://arxiv.org/abs/2404.16767
Adversarial imitation learning (AIL) has stood out as a dominant framework across various imitation learning (IL) applications, with Discriminator Actor Critic (DAC) (Kostrikov et al.,, 2019) demonstrating the effectiveness of off-policy learning alg
Externí odkaz:
http://arxiv.org/abs/2404.08513
Autor:
Chang, Jonathan D., Zhan, Wenhao, Oertell, Owen, Brantley, Kianté, Misra, Dipendra, Lee, Jason D., Sun, Wen
Reinforcement Learning (RL) from Human Preference-based feedback is a popular paradigm for fine-tuning generative models, which has produced impressive models such as GPT-4 and Claude3 Opus. This framework often consists of two steps: learning a rewa
Externí odkaz:
http://arxiv.org/abs/2404.08495
Reinforcement learning (RL) has improved guided image generation with diffusion models by directly optimizing rewards that capture image quality, aesthetics, and instruction following capabilities. However, the resulting generative policies inherit t
Externí odkaz:
http://arxiv.org/abs/2404.03673
We introduce a notion of distance between supervised learning problems, which we call the Risk distance. This optimal-transport-inspired distance facilitates stability results; one can quantify how seriously issues like sampling bias, noise, limited
Externí odkaz:
http://arxiv.org/abs/2403.01660
This study evaluates three state-of-the-art MLLMs -- GPT-4V, Gemini Pro, and the open-source model IDEFICS -- on the compositional natural language vision reasoning task NLVR. Given a human-written sentence paired with a synthetic image, this task re
Externí odkaz:
http://arxiv.org/abs/2402.17793
Recent developments in LLMs offer new opportunities for assisting authors in improving their work. In this paper, we envision a use case where authors can receive LLM-generated reviews that uncover weak points in the current draft. While initial meth
Externí odkaz:
http://arxiv.org/abs/2402.10886