Výsledky vyhledávání - "Sharma, Archit"

Report

Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data

Autor: Tajwar, Fahim, Singh, Anikait, Sharma, Archit, Rafailov, Rafael, Schneider, Jeff, Xie, Tengyang, Ermon, Stefano, Finn, Chelsea, Kumar, Aviral

Learning from preference labels plays a crucial role in fine-tuning large language models. There are several distinct approaches for preference fine-tuning, including supervised learning, on-policy reinforcement learning (RL), and contrastive learnin

Externí odkaz: http://arxiv.org/abs/2404.14367

Zobrazit plný text záznamu

Report

Stream of Search (SoS): Learning to Search in Language

Autor: Gandhi, Kanishk, Lee, Denise, Grand, Gabriel, Liu, Muxin, Cheng, Winson, Sharma, Archit, Goodman, Noah D.

Language models are rarely shown fruitful mistakes while training. They then struggle to look beyond the next token, suffering from a snowballing of errors and struggling to predict the consequence of their actions several steps ahead. In this paper,

Externí odkaz: http://arxiv.org/abs/2404.03683

Zobrazit plný text záznamu

Report

DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

Autor: Khazatsky, Alexander, Pertsch, Karl, Nair, Suraj, Balakrishna, Ashwin, Dasari, Sudeep, Karamcheti, Siddharth, Nasiriany, Soroush, Srirama, Mohan Kumar, Chen, Lawrence Yunliang, Ellis, Kirsty, Fagan, Peter David, Hejna, Joey, Itkina, Masha, Lepert, Marion, Ma, Yecheng Jason, Miller, Patrick Tree, Wu, Jimmy, Belkhale, Suneel, Dass, Shivin, Ha, Huy, Jain, Arhan, Lee, Abraham, Lee, Youngwoon, Memmel, Marius, Park, Sungjae, Radosavovic, Ilija, Wang, Kaiyuan, Zhan, Albert, Black, Kevin, Chi, Cheng, Hatch, Kyle Beltran, Lin, Shan, Lu, Jingpei, Mercat, Jean, Rehman, Abdul, Sanketi, Pannag R, Sharma, Archit, Simpson, Cody, Vuong, Quan, Walke, Homer Rich, Wulfe, Blake, Xiao, Ted, Yang, Jonathan Heewon, Yavary, Arefeh, Zhao, Tony Z., Agia, Christopher, Baijal, Rohan, Castro, Mateo Guaman, Chen, Daphne, Chen, Qiuyu, Chung, Trinity, Drake, Jaimyn, Foster, Ethan Paul, Gao, Jensen, Herrera, David Antonio, Heo, Minho, Hsu, Kyle, Hu, Jiaheng, Jackson, Donovon, Le, Charlotte, Li, Yunshuang, Lin, Kevin, Lin, Roy, Ma, Zehan, Maddukuri, Abhiram, Mirchandani, Suvir, Morton, Daniel, Nguyen, Tony, O'Neill, Abigail, Scalise, Rosario, Seale, Derick, Son, Victor, Tian, Stephen, Tran, Emi, Wang, Andrew E., Wu, Yilin, Xie, Annie, Yang, Jingyun, Yin, Patrick, Zhang, Yunchu, Bastani, Osbert, Berseth, Glen, Bohg, Jeannette, Goldberg, Ken, Gupta, Abhinav, Gupta, Abhishek, Jayaraman, Dinesh, Lim, Joseph J, Malik, Jitendra, Martín-Martín, Roberto, Ramamoorthy, Subramanian, Sadigh, Dorsa, Song, Shuran, Wu, Jiajun, Yip, Michael C., Zhu, Yuke, Kollar, Thomas, Levine, Sergey, Finn, Chelsea

The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipul

Externí odkaz: http://arxiv.org/abs/2403.12945

Zobrazit plný text záznamu

Report

Yell At Your Robot: Improving On-the-Fly from Language Corrections

Autor: Shi, Lucy Xiaoyang, Hu, Zheyuan, Zhao, Tony Z., Sharma, Archit, Pertsch, Karl, Luo, Jianlan, Levine, Sergey, Finn, Chelsea

Hierarchical policies that combine language and low-level control have been shown to perform impressively long-horizon robotic tasks, by leveraging either zero-shot high-level planners like pretrained language and vision-language models (LLMs/VLMs) o

Externí odkaz: http://arxiv.org/abs/2403.12910

Zobrazit plný text záznamu

Report

A Critical Evaluation of AI Feedback for Aligning Large Language Models

Autor: Sharma, Archit, Keh, Sedrick, Mitchell, Eric, Finn, Chelsea, Arora, Kushal, Kollar, Thomas

Reinforcement learning with AI feedback (RLAIF) is a popular paradigm for improving the instruction-following abilities of powerful pre-trained language models. RLAIF first performs supervised fine-tuning (SFT) using demonstrations from a teacher mod

Externí odkaz: http://arxiv.org/abs/2402.12366

Zobrazit plný text záznamu

Report

RLVF: Learning from Verbal Feedback without Overgeneralization

Autor: Stephan, Moritz, Khazatsky, Alexander, Mitchell, Eric, Chen, Annie S, Hsu, Sheryl, Sharma, Archit, Finn, Chelsea

The diversity of contexts in which large language models (LLMs) are deployed requires the ability to modify or customize default model behaviors to incorporate nuanced requirements and preferences. A convenient interface to specify such model adjustm

Externí odkaz: http://arxiv.org/abs/2402.10893

Zobrazit plný text záznamu

Report

SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning

Autor: Luo, Jianlan, Hu, Zheyuan, Xu, Charles, Tan, You Liang, Berg, Jacob, Sharma, Archit, Schaal, Stefan, Finn, Chelsea, Gupta, Abhishek, Levine, Sergey

In recent years, significant progress has been made in the field of robotic reinforcement learning (RL), enabling methods that handle complex image observations, train in the real world, and incorporate auxiliary data, such as demonstrations and prio

Externí odkaz: http://arxiv.org/abs/2401.16013

Zobrazit plný text záznamu

Report

Adapt On-the-Go: Behavior Modulation for Single-Life Robot Deployment

Autor: Chen, Annie S., Chada, Govind, Smith, Laura, Sharma, Archit, Fu, Zipeng, Levine, Sergey, Finn, Chelsea

To succeed in the real world, robots must cope with situations that differ from those seen during training. We study the problem of adapting on-the-fly to such novel scenarios during deployment, by drawing upon a diverse repertoire of previously lear

Externí odkaz: http://arxiv.org/abs/2311.01059

Zobrazit plný text záznamu

Report

Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for Autonomous Real-World Reinforcement Learning

Autor: Yang, Jingyun, Mark, Max Sobol, Vu, Brandon, Sharma, Archit, Bohg, Jeannette, Finn, Chelsea

The pre-train and fine-tune paradigm in machine learning has had dramatic success in a wide range of domains because the use of existing data or pre-trained models on the internet enables quick and easy learning of new tasks. We aim to enable this pa

Externí odkaz: http://arxiv.org/abs/2310.15145

Zobrazit plný text záznamu

Report

An Emulator for Fine-Tuning Large Language Models using Small Language Models

Autor: Mitchell, Eric, Rafailov, Rafael, Sharma, Archit, Finn, Chelsea, Manning, Christopher D.

Widely used language models (LMs) are typically built by scaling up a two-stage training pipeline: a pre-training stage that uses a very large, diverse dataset of text and a fine-tuning (sometimes, 'alignment') stage that uses targeted examples or ot

Externí odkaz: http://arxiv.org/abs/2310.12962

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání