Zobrazeno 1 - 10
of 144
pro vyhledávání: '"Sharma, Archit"'
Autor:
Tajwar, Fahim, Singh, Anikait, Sharma, Archit, Rafailov, Rafael, Schneider, Jeff, Xie, Tengyang, Ermon, Stefano, Finn, Chelsea, Kumar, Aviral
Learning from preference labels plays a crucial role in fine-tuning large language models. There are several distinct approaches for preference fine-tuning, including supervised learning, on-policy reinforcement learning (RL), and contrastive learnin
Externí odkaz:
http://arxiv.org/abs/2404.14367
Autor:
Gandhi, Kanishk, Lee, Denise, Grand, Gabriel, Liu, Muxin, Cheng, Winson, Sharma, Archit, Goodman, Noah D.
Language models are rarely shown fruitful mistakes while training. They then struggle to look beyond the next token, suffering from a snowballing of errors and struggling to predict the consequence of their actions several steps ahead. In this paper,
Externí odkaz:
http://arxiv.org/abs/2404.03683
Autor:
Khazatsky, Alexander, Pertsch, Karl, Nair, Suraj, Balakrishna, Ashwin, Dasari, Sudeep, Karamcheti, Siddharth, Nasiriany, Soroush, Srirama, Mohan Kumar, Chen, Lawrence Yunliang, Ellis, Kirsty, Fagan, Peter David, Hejna, Joey, Itkina, Masha, Lepert, Marion, Ma, Yecheng Jason, Miller, Patrick Tree, Wu, Jimmy, Belkhale, Suneel, Dass, Shivin, Ha, Huy, Jain, Arhan, Lee, Abraham, Lee, Youngwoon, Memmel, Marius, Park, Sungjae, Radosavovic, Ilija, Wang, Kaiyuan, Zhan, Albert, Black, Kevin, Chi, Cheng, Hatch, Kyle Beltran, Lin, Shan, Lu, Jingpei, Mercat, Jean, Rehman, Abdul, Sanketi, Pannag R, Sharma, Archit, Simpson, Cody, Vuong, Quan, Walke, Homer Rich, Wulfe, Blake, Xiao, Ted, Yang, Jonathan Heewon, Yavary, Arefeh, Zhao, Tony Z., Agia, Christopher, Baijal, Rohan, Castro, Mateo Guaman, Chen, Daphne, Chen, Qiuyu, Chung, Trinity, Drake, Jaimyn, Foster, Ethan Paul, Gao, Jensen, Herrera, David Antonio, Heo, Minho, Hsu, Kyle, Hu, Jiaheng, Jackson, Donovon, Le, Charlotte, Li, Yunshuang, Lin, Kevin, Lin, Roy, Ma, Zehan, Maddukuri, Abhiram, Mirchandani, Suvir, Morton, Daniel, Nguyen, Tony, O'Neill, Abigail, Scalise, Rosario, Seale, Derick, Son, Victor, Tian, Stephen, Tran, Emi, Wang, Andrew E., Wu, Yilin, Xie, Annie, Yang, Jingyun, Yin, Patrick, Zhang, Yunchu, Bastani, Osbert, Berseth, Glen, Bohg, Jeannette, Goldberg, Ken, Gupta, Abhinav, Gupta, Abhishek, Jayaraman, Dinesh, Lim, Joseph J, Malik, Jitendra, Martín-Martín, Roberto, Ramamoorthy, Subramanian, Sadigh, Dorsa, Song, Shuran, Wu, Jiajun, Yip, Michael C., Zhu, Yuke, Kollar, Thomas, Levine, Sergey, Finn, Chelsea
The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipul
Externí odkaz:
http://arxiv.org/abs/2403.12945
Autor:
Shi, Lucy Xiaoyang, Hu, Zheyuan, Zhao, Tony Z., Sharma, Archit, Pertsch, Karl, Luo, Jianlan, Levine, Sergey, Finn, Chelsea
Hierarchical policies that combine language and low-level control have been shown to perform impressively long-horizon robotic tasks, by leveraging either zero-shot high-level planners like pretrained language and vision-language models (LLMs/VLMs) o
Externí odkaz:
http://arxiv.org/abs/2403.12910
Reinforcement learning with AI feedback (RLAIF) is a popular paradigm for improving the instruction-following abilities of powerful pre-trained language models. RLAIF first performs supervised fine-tuning (SFT) using demonstrations from a teacher mod
Externí odkaz:
http://arxiv.org/abs/2402.12366
Autor:
Stephan, Moritz, Khazatsky, Alexander, Mitchell, Eric, Chen, Annie S, Hsu, Sheryl, Sharma, Archit, Finn, Chelsea
The diversity of contexts in which large language models (LLMs) are deployed requires the ability to modify or customize default model behaviors to incorporate nuanced requirements and preferences. A convenient interface to specify such model adjustm
Externí odkaz:
http://arxiv.org/abs/2402.10893
Autor:
Luo, Jianlan, Hu, Zheyuan, Xu, Charles, Tan, You Liang, Berg, Jacob, Sharma, Archit, Schaal, Stefan, Finn, Chelsea, Gupta, Abhishek, Levine, Sergey
In recent years, significant progress has been made in the field of robotic reinforcement learning (RL), enabling methods that handle complex image observations, train in the real world, and incorporate auxiliary data, such as demonstrations and prio
Externí odkaz:
http://arxiv.org/abs/2401.16013
Autor:
Chen, Annie S., Chada, Govind, Smith, Laura, Sharma, Archit, Fu, Zipeng, Levine, Sergey, Finn, Chelsea
To succeed in the real world, robots must cope with situations that differ from those seen during training. We study the problem of adapting on-the-fly to such novel scenarios during deployment, by drawing upon a diverse repertoire of previously lear
Externí odkaz:
http://arxiv.org/abs/2311.01059
The pre-train and fine-tune paradigm in machine learning has had dramatic success in a wide range of domains because the use of existing data or pre-trained models on the internet enables quick and easy learning of new tasks. We aim to enable this pa
Externí odkaz:
http://arxiv.org/abs/2310.15145
Widely used language models (LMs) are typically built by scaling up a two-stage training pipeline: a pre-training stage that uses a very large, diverse dataset of text and a fine-tuning (sometimes, 'alignment') stage that uses targeted examples or ot
Externí odkaz:
http://arxiv.org/abs/2310.12962