Výsledky vyhledávání

Report

For an unknown finite group $G$ of automorphisms of a finite-dimensional Hilbert space, we find sharp bounds on the number of generic $G$-orbits needed to recover $G$ up to group isomorphism, as well as the number needed to recover $G$ as a concrete

Externí odkaz: http://arxiv.org/abs/2411.17434

Zobrazit plný text záznamu

Report

Diffusing States and Matching Scores: A New Framework for Imitation Learning

Autor: Wu, Runzhe, Chen, Yiding, Swamy, Gokul, Brantley, Kianté, Sun, Wen

Adversarial Imitation Learning is traditionally framed as a two-player zero-sum game between a learner and an adversarially chosen cost function, and can therefore be thought of as the sequential generalization of a Generative Adversarial Network (GA

Externí odkaz: http://arxiv.org/abs/2410.13855

Zobrazit plný text záznamu

Report

LLMs Are In-Context Reinforcement Learners

Autor: Monea, Giovanni, Bosselut, Antoine, Brantley, Kianté, Artzi, Yoav

Large Language Models (LLMs) can learn new tasks through in-context supervised learning (i.e., ICL). This work studies if this ability extends to in-context reinforcement learning (ICRL), where models are not given gold labels in context, but only th

Externí odkaz: http://arxiv.org/abs/2410.05362

Zobrazit plný text záznamu

Report

Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF

Autor: Gao, Zhaolin, Zhan, Wenhao, Chang, Jonathan D., Swamy, Gokul, Brantley, Kianté, Lee, Jason D., Sun, Wen

Large Language Models (LLMs) have achieved remarkable success at tasks like summarization that involve a single turn of interaction. However, they can still struggle with multi-turn tasks like dialogue that require long-term planning. Previous works

Externí odkaz: http://arxiv.org/abs/2410.04612

Zobrazit plný text záznamu

Report

REBEL: Reinforcement Learning via Regressing Relative Rewards

Autor: Gao, Zhaolin, Chang, Jonathan D., Zhan, Wenhao, Oertell, Owen, Swamy, Gokul, Brantley, Kianté, Joachims, Thorsten, Bagnell, J. Andrew, Lee, Jason D., Sun, Wen

While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications, including the fine-tuning of generative models. Unfortunately, PPO

Externí odkaz: http://arxiv.org/abs/2404.16767

Zobrazit plný text záznamu

Report

Adversarial Imitation Learning via Boosting

Autor: Chang, Jonathan D., Sreenivas, Dhruv, Huang, Yingbing, Brantley, Kianté, Sun, Wen

Adversarial imitation learning (AIL) has stood out as a dominant framework across various imitation learning (IL) applications, with Discriminator Actor Critic (DAC) (Kostrikov et al.,, 2019) demonstrating the effectiveness of off-policy learning alg

Externí odkaz: http://arxiv.org/abs/2404.08513

Zobrazit plný text záznamu

Report

Dataset Reset Policy Optimization for RLHF

Autor: Chang, Jonathan D., Zhan, Wenhao, Oertell, Owen, Brantley, Kianté, Misra, Dipendra, Lee, Jason D., Sun, Wen

Reinforcement Learning (RL) from Human Preference-based feedback is a popular paradigm for fine-tuning generative models, which has produced impressive models such as GPT-4 and Claude3 Opus. This framework often consists of two steps: learning a rewa

Externí odkaz: http://arxiv.org/abs/2404.08495

Zobrazit plný text záznamu

Report

RL for Consistency Models: Faster Reward Guided Text-to-Image Generation

Autor: Oertell, Owen, Chang, Jonathan D., Zhang, Yiyi, Brantley, Kianté, Sun, Wen

Reinforcement learning (RL) has improved guided image generation with diffusion models by directly optimizing rewards that capture image quality, aesthetics, and instruction following capabilities. However, the resulting generative policies inherit t

Externí odkaz: http://arxiv.org/abs/2404.03673

Zobrazit plný text záznamu

Report

Geometry and Stability of Supervised Learning Problems

Autor: Mémoli, Facundo, Vose, Brantley, Williamson, Robert C.

We introduce a notion of distance between supervised learning problems, which we call the Risk distance. This optimal-transport-inspired distance facilitates stability results; one can quantify how seriously issues like sampling bias, noise, limited

Externí odkaz: http://arxiv.org/abs/2403.01660

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání