Výsledky vyhledávání

Report

Vision Language Models are In-Context Value Learners

Autor: Ma, Yecheng Jason, Hejna, Joey, Wahid, Ayzaan, Fu, Chuyuan, Shah, Dhruv, Liang, Jacky, Xu, Zhuo, Kirmani, Sean, Xu, Peng, Driess, Danny, Xiao, Ted, Tompson, Jonathan, Bastani, Osbert, Jayaraman, Dinesh, Yu, Wenhao, Zhang, Tingnan, Sadigh, Dorsa, Xia, Fei

Predicting temporal progress from visual trajectories is important for intelligent robots that can learn, adapt, and improve. However, learning such progress estimator, or temporal value function, across different tasks and domains requires both a la

Externí odkaz: http://arxiv.org/abs/2411.04549

Zobrazit plný text záznamu

Report

So You Think You Can Scale Up Autonomous Robot Data Collection?

Autor: Mirchandani, Suvir, Belkhale, Suneel, Hejna, Joey, Choi, Evelyn, Islam, Md Sazzad, Sadigh, Dorsa

A long-standing goal in robot learning is to develop methods for robots to acquire new skills autonomously. While reinforcement learning (RL) comes with the promise of enabling autonomous data collection, it remains challenging to scale in the real-w

Externí odkaz: http://arxiv.org/abs/2411.01813

Zobrazit plný text záznamu

Report

MotIF: Motion Instruction Fine-tuning

Autor: Hwang, Minyoung, Hejna, Joey, Sadigh, Dorsa, Bisk, Yonatan

While success in many robotics tasks can be determined by only observing the final state and how it differs from the initial state - e.g., if an apple is picked up - many tasks require observing the full motion of the robot to correctly determine suc

Externí odkaz: http://arxiv.org/abs/2409.10683

Zobrazit plný text záznamu

Report

Re-Mix: Optimizing Data Mixtures for Large Scale Imitation Learning

Autor: Hejna, Joey, Bhateja, Chethan, Jian, Yichen, Pertsch, Karl, Sadigh, Dorsa

Increasingly large imitation learning datasets are being collected with the goal of training foundation models for robotics. However, despite the fact that data selection has been of utmost importance in vision and natural language processing, little

Externí odkaz: http://arxiv.org/abs/2408.14037

Zobrazit plný text záznamu

Report

Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms

Autor: Rafailov, Rafael, Chittepu, Yaswanth, Park, Ryan, Sikchi, Harshit, Hejna, Joey, Knox, Bradley, Finn, Chelsea, Niekum, Scott

Reinforcement Learning from Human Feedback (RLHF) has been crucial to the recent success of Large Language Models (LLMs), however, it is often a complex and brittle process. In the classical RLHF framework, a reward model is first trained to represen

Externí odkaz: http://arxiv.org/abs/2406.02900

Zobrazit plný text záznamu

Report

Show, Don't Tell: Aligning Language Models with Demonstrated Feedback

Autor: Shaikh, Omar, Lam, Michelle, Hejna, Joey, Shao, Yijia, Bernstein, Michael, Yang, Diyi

Language models are aligned to emulate the collective voice of many, resulting in outputs that align with no one in particular. Steering LLMs away from generic output is possible through supervised finetuning or RLHF, but requires prohibitively large

Externí odkaz: http://arxiv.org/abs/2406.00888

Zobrazit plný text záznamu

Report

Octo: An Open-Source Generalist Robot Policy

Autor: Octo Model Team, Ghosh, Dibya, Walke, Homer, Pertsch, Karl, Black, Kevin, Mees, Oier, Dasari, Sudeep, Hejna, Joey, Kreiman, Tobias, Xu, Charles, Luo, Jianlan, Tan, You Liang, Chen, Lawrence Yunliang, Sanketi, Pannag, Vuong, Quan, Xiao, Ted, Sadigh, Dorsa, Finn, Chelsea, Levine, Sergey

Large policies pretrained on diverse robot datasets have the potential to transform robotic learning: instead of training new policies from scratch, such generalist robot policies may be finetuned with only a little in-domain data, yet generalize bro

Externí odkaz: http://arxiv.org/abs/2405.12213

Zobrazit plný text záznamu

Report

From $r$ to $Q^*$: Your Language Model is Secretly a Q-Function

Autor: Rafailov, Rafael, Hejna, Joey, Park, Ryan, Finn, Chelsea

Reinforcement Learning From Human Feedback (RLHF) has been critical to the success of the latest generation of generative AI models. In response to the complex nature of the classical RLHF pipeline, direct alignment algorithms such as Direct Preferen

Externí odkaz: http://arxiv.org/abs/2404.12358

Zobrazit plný text záznamu

Report

DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

Autor: Khazatsky, Alexander, Pertsch, Karl, Nair, Suraj, Balakrishna, Ashwin, Dasari, Sudeep, Karamcheti, Siddharth, Nasiriany, Soroush, Srirama, Mohan Kumar, Chen, Lawrence Yunliang, Ellis, Kirsty, Fagan, Peter David, Hejna, Joey, Itkina, Masha, Lepert, Marion, Ma, Yecheng Jason, Miller, Patrick Tree, Wu, Jimmy, Belkhale, Suneel, Dass, Shivin, Ha, Huy, Jain, Arhan, Lee, Abraham, Lee, Youngwoon, Memmel, Marius, Park, Sungjae, Radosavovic, Ilija, Wang, Kaiyuan, Zhan, Albert, Black, Kevin, Chi, Cheng, Hatch, Kyle Beltran, Lin, Shan, Lu, Jingpei, Mercat, Jean, Rehman, Abdul, Sanketi, Pannag R, Sharma, Archit, Simpson, Cody, Vuong, Quan, Walke, Homer Rich, Wulfe, Blake, Xiao, Ted, Yang, Jonathan Heewon, Yavary, Arefeh, Zhao, Tony Z., Agia, Christopher, Baijal, Rohan, Castro, Mateo Guaman, Chen, Daphne, Chen, Qiuyu, Chung, Trinity, Drake, Jaimyn, Foster, Ethan Paul, Gao, Jensen, Herrera, David Antonio, Heo, Minho, Hsu, Kyle, Hu, Jiaheng, Jackson, Donovon, Le, Charlotte, Li, Yunshuang, Lin, Kevin, Lin, Roy, Ma, Zehan, Maddukuri, Abhiram, Mirchandani, Suvir, Morton, Daniel, Nguyen, Tony, O'Neill, Abigail, Scalise, Rosario, Seale, Derick, Son, Victor, Tian, Stephen, Tran, Emi, Wang, Andrew E., Wu, Yilin, Xie, Annie, Yang, Jingyun, Yin, Patrick, Zhang, Yunchu, Bastani, Osbert, Berseth, Glen, Bohg, Jeannette, Goldberg, Ken, Gupta, Abhinav, Gupta, Abhishek, Jayaraman, Dinesh, Lim, Joseph J, Malik, Jitendra, Martín-Martín, Roberto, Ramamoorthy, Subramanian, Sadigh, Dorsa, Song, Shuran, Wu, Jiajun, Yip, Michael C., Zhu, Yuke, Kollar, Thomas, Levine, Sergey, Finn, Chelsea

The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipul

Externí odkaz: http://arxiv.org/abs/2403.12945

Zobrazit plný text záznamu

Report

Contrastive Preference Learning: Learning from Human Feedback without RL

Autor: Hejna, Joey, Rafailov, Rafael, Sikchi, Harshit, Finn, Chelsea, Niekum, Scott, Knox, W. Bradley, Sadigh, Dorsa

Reinforcement Learning from Human Feedback (RLHF) has emerged as a popular paradigm for aligning models with human intent. Typically RLHF algorithms operate in two phases: first, use human preferences to learn a reward function and second, align the

Externí odkaz: http://arxiv.org/abs/2310.13639

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání