Výsledky vyhledávání

Report

Re-Mix: Optimizing Data Mixtures for Large Scale Imitation Learning

Autor: Hejna, Joey, Bhateja, Chethan, Jian, Yichen, Pertsch, Karl, Sadigh, Dorsa

Increasingly large imitation learning datasets are being collected with the goal of training foundation models for robotics. However, despite the fact that data selection has been of utmost importance in vision and natural language processing, little

Externí odkaz: http://arxiv.org/abs/2408.14037

Zobrazit plný text záznamu

Report

On Lipschitz spaces in the Dunkl setting -- semigroup approach

Autor: Dziubański, Jacek, Hejna, Agnieszka

Let $\{P_t\}_{t>0}$ be the Dunkl-Poisson semigroup associated with a root system $R\subset \mathbb R^N$ and a multiplicity function $k\geq 0$. Analogously to the classical theory, we say that a bounded measurable function $f$ defined on $\mathbb R^N$

Externí odkaz: http://arxiv.org/abs/2408.12399

Zobrazit plný text záznamu

Report

Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms

Autor: Rafailov, Rafael, Chittepu, Yaswanth, Park, Ryan, Sikchi, Harshit, Hejna, Joey, Knox, Bradley, Finn, Chelsea, Niekum, Scott

Reinforcement Learning from Human Feedback (RLHF) has been crucial to the recent success of Large Language Models (LLMs), however, it is often a complex and brittle process. In the classical RLHF framework, a reward model is first trained to represen

Externí odkaz: http://arxiv.org/abs/2406.02900

Zobrazit plný text záznamu

Report

Harnessing Business and Media Insights with Large Language Models

This paper introduces Fortune Analytics Language Model (FALM). FALM empowers users with direct access to comprehensive business analysis, including market trends, company performance metrics, and expert insights. Unlike generic LLMs, FALM leverages a

Externí odkaz: http://arxiv.org/abs/2406.06559

Zobrazit plný text záznamu

Report

Show, Don't Tell: Aligning Language Models with Demonstrated Feedback

Autor: Shaikh, Omar, Lam, Michelle, Hejna, Joey, Shao, Yijia, Bernstein, Michael, Yang, Diyi

Language models are aligned to emulate the collective voice of many, resulting in outputs that align with no one in particular. Steering LLMs away from generic output is possible through supervised finetuning or RLHF, but requires prohibitively large

Externí odkaz: http://arxiv.org/abs/2406.00888

Zobrazit plný text záznamu

Report

Octo: An Open-Source Generalist Robot Policy

Autor: Octo Model Team, Ghosh, Dibya, Walke, Homer, Pertsch, Karl, Black, Kevin, Mees, Oier, Dasari, Sudeep, Hejna, Joey, Kreiman, Tobias, Xu, Charles, Luo, Jianlan, Tan, You Liang, Chen, Lawrence Yunliang, Sanketi, Pannag, Vuong, Quan, Xiao, Ted, Sadigh, Dorsa, Finn, Chelsea, Levine, Sergey

Large policies pretrained on diverse robot datasets have the potential to transform robotic learning: instead of training new policies from scratch, such generalist robot policies may be finetuned with only a little in-domain data, yet generalize bro

Externí odkaz: http://arxiv.org/abs/2405.12213

Zobrazit plný text záznamu

Report

From $r$ to $Q^*$: Your Language Model is Secretly a Q-Function

Autor: Rafailov, Rafael, Hejna, Joey, Park, Ryan, Finn, Chelsea

Reinforcement Learning From Human Feedback (RLHF) has been critical to the success of the latest generation of generative AI models. In response to the complex nature of the classical RLHF pipeline, direct alignment algorithms such as Direct Preferen

Externí odkaz: http://arxiv.org/abs/2404.12358

Zobrazit plný text záznamu

Report

DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

Autor: Khazatsky, Alexander, Pertsch, Karl, Nair, Suraj, Balakrishna, Ashwin, Dasari, Sudeep, Karamcheti, Siddharth, Nasiriany, Soroush, Srirama, Mohan Kumar, Chen, Lawrence Yunliang, Ellis, Kirsty, Fagan, Peter David, Hejna, Joey, Itkina, Masha, Lepert, Marion, Ma, Yecheng Jason, Miller, Patrick Tree, Wu, Jimmy, Belkhale, Suneel, Dass, Shivin, Ha, Huy, Jain, Arhan, Lee, Abraham, Lee, Youngwoon, Memmel, Marius, Park, Sungjae, Radosavovic, Ilija, Wang, Kaiyuan, Zhan, Albert, Black, Kevin, Chi, Cheng, Hatch, Kyle Beltran, Lin, Shan, Lu, Jingpei, Mercat, Jean, Rehman, Abdul, Sanketi, Pannag R, Sharma, Archit, Simpson, Cody, Vuong, Quan, Walke, Homer Rich, Wulfe, Blake, Xiao, Ted, Yang, Jonathan Heewon, Yavary, Arefeh, Zhao, Tony Z., Agia, Christopher, Baijal, Rohan, Castro, Mateo Guaman, Chen, Daphne, Chen, Qiuyu, Chung, Trinity, Drake, Jaimyn, Foster, Ethan Paul, Gao, Jensen, Herrera, David Antonio, Heo, Minho, Hsu, Kyle, Hu, Jiaheng, Jackson, Donovon, Le, Charlotte, Li, Yunshuang, Lin, Kevin, Lin, Roy, Ma, Zehan, Maddukuri, Abhiram, Mirchandani, Suvir, Morton, Daniel, Nguyen, Tony, O'Neill, Abigail, Scalise, Rosario, Seale, Derick, Son, Victor, Tian, Stephen, Tran, Emi, Wang, Andrew E., Wu, Yilin, Xie, Annie, Yang, Jingyun, Yin, Patrick, Zhang, Yunchu, Bastani, Osbert, Berseth, Glen, Bohg, Jeannette, Goldberg, Ken, Gupta, Abhinav, Gupta, Abhishek, Jayaraman, Dinesh, Lim, Joseph J, Malik, Jitendra, Martín-Martín, Roberto, Ramamoorthy, Subramanian, Sadigh, Dorsa, Song, Shuran, Wu, Jiajun, Yip, Michael C., Zhu, Yuke, Kollar, Thomas, Levine, Sergey, Finn, Chelsea

The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipul

Externí odkaz: http://arxiv.org/abs/2403.12945

Zobrazit plný text záznamu

Report

Contrastive Preference Learning: Learning from Human Feedback without RL

Autor: Hejna, Joey, Rafailov, Rafael, Sikchi, Harshit, Finn, Chelsea, Niekum, Scott, Knox, W. Bradley, Sadigh, Dorsa

Reinforcement Learning from Human Feedback (RLHF) has emerged as a popular paradigm for aligning models with human intent. Typically RLHF algorithms operate in two phases: first, use human preferences to learn a reward function and second, align the

Externí odkaz: http://arxiv.org/abs/2310.13639

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání