Výsledky vyhledávání - "Dipendra, P."

Report

Is Behavior Cloning All You Need? Understanding Horizon in Imitation Learning

Autor: Foster, Dylan J., Block, Adam, Misra, Dipendra

Imitation learning (IL) aims to mimic the behavior of an expert in a sequential decision making task by learning from demonstrations, and has been widely applied to robotics, autonomous driving, and autoregressive text generation. The simplest approa

Externí odkaz: http://arxiv.org/abs/2407.15007

Zobrazit plný text záznamu

Report

Aligning LLM Agents by Learning Latent Preference from User Edits

Autor: Gao, Ge, Taymanov, Alexey, Salinas, Eduardo, Mineiro, Paul, Misra, Dipendra

We study interactive learning of LLM-based language agents based on user edits made to the agent's output. In a typical setting such as writing assistants, the user interacts with a language agent to generate a response given a context, and may optio

Externí odkaz: http://arxiv.org/abs/2404.15269

Zobrazit plný text záznamu

Report

Provable Interactive Learning with Hindsight Instruction Feedback

Autor: Misra, Dipendra, Pacchiano, Aldo, Schapire, Robert E.

We study interactive learning in a setting where the agent has to generate a response (e.g., an action or trajectory) given a context and an instruction. In contrast, to typical approaches that train the system using reward or expert supervision on r

Externí odkaz: http://arxiv.org/abs/2404.09123

Zobrazit plný text záznamu

Report

Dataset Reset Policy Optimization for RLHF

Autor: Chang, Jonathan D., Zhan, Wenhao, Oertell, Owen, Brantley, Kianté, Misra, Dipendra, Lee, Jason D., Sun, Wen

Reinforcement Learning (RL) from Human Preference-based feedback is a popular paradigm for fine-tuning generative models, which has produced impressive models such as GPT-4 and Claude3 Opus. This framework often consists of two steps: learning a rewa

Externí odkaz: http://arxiv.org/abs/2404.08495

Zobrazit plný text záznamu

Report

Towards Principled Representation Learning from Videos for Reinforcement Learning

Autor: Misra, Dipendra, Saran, Akanksha, Xie, Tengyang, Lamb, Alex, Langford, John

We study pre-training representations for decision-making using video data, which is abundantly available for tasks such as game agents and software testing. Even though significant empirical advances have been made on this problem, a theoretical und

Externí odkaz: http://arxiv.org/abs/2403.13765

Zobrazit plný text záznamu

Report

Policy Improvement using Language Feedback Models

Autor: Zhong, Victor, Misra, Dipendra, Yuan, Xingdi, Côté, Marc-Alexandre

We introduce Language Feedback Models (LFMs) that identify desirable behaviour - actions that help achieve tasks specified in the instruction - for imitation learning in instruction following. To train LFMs, we obtain feedback from Large Language Mod

Externí odkaz: http://arxiv.org/abs/2402.07876

Zobrazit plný text záznamu

Report

Towards a Unified Multimodal Reasoning Framework

Autor: Arun, Abhinav, Mal, Dipendra Singh, Soni, Mehul, Sawada, Tomohiro

Recent advancements in deep learning have led to the development of powerful language models (LMs) that excel in various tasks. Despite these achievements, there is still room for improvement, particularly in enhancing reasoning abilities and incorpo

Externí odkaz: http://arxiv.org/abs/2312.15021

Zobrazit plný text záznamu

Report

The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction

Autor: Sharma, Pratyusha, Ash, Jordan T., Misra, Dipendra

Transformer-based Large Language Models (LLMs) have become a fixture in modern machine learning. Correspondingly, significant resources are allocated towards research that aims to further advance this technology, typically resulting in models of incr

Externí odkaz: http://arxiv.org/abs/2312.13558

Zobrazit plný text záznamu

Report

LLF-Bench: Benchmark for Interactive Learning from Language Feedback

Autor: Cheng, Ching-An, Kolobov, Andrey, Misra, Dipendra, Nie, Allen, Swaminathan, Adith

We introduce a new benchmark, LLF-Bench (Learning from Language Feedback Benchmark; pronounced as "elf-bench"), to evaluate the ability of AI agents to interactively learn from natural language feedback and instructions. Learning from language feedba

Externí odkaz: http://arxiv.org/abs/2312.06853

Zobrazit plný text záznamu

Report

Learning to Generate Better Than Your LLM

Autor: Chang, Jonathan D., Brantley, Kiante, Ramamurthy, Rajkumar, Misra, Dipendra, Sun, Wen

Reinforcement learning (RL) has emerged as a powerful paradigm for fine-tuning Large Language Models (LLMs) for text generation. In particular, recent LLMs such as ChatGPT and GPT-4 can engage in fluent conversations with users after finetuning with

Externí odkaz: http://arxiv.org/abs/2306.11816

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání