Zobrazeno 1 - 10
of 1 015
pro vyhledávání: '"Hejna P"'
Autor:
Ma, Yecheng Jason, Hejna, Joey, Wahid, Ayzaan, Fu, Chuyuan, Shah, Dhruv, Liang, Jacky, Xu, Zhuo, Kirmani, Sean, Xu, Peng, Driess, Danny, Xiao, Ted, Tompson, Jonathan, Bastani, Osbert, Jayaraman, Dinesh, Yu, Wenhao, Zhang, Tingnan, Sadigh, Dorsa, Xia, Fei
Predicting temporal progress from visual trajectories is important for intelligent robots that can learn, adapt, and improve. However, learning such progress estimator, or temporal value function, across different tasks and domains requires both a la
Externí odkaz:
http://arxiv.org/abs/2411.04549
Autor:
Mirchandani, Suvir, Belkhale, Suneel, Hejna, Joey, Choi, Evelyn, Islam, Md Sazzad, Sadigh, Dorsa
A long-standing goal in robot learning is to develop methods for robots to acquire new skills autonomously. While reinforcement learning (RL) comes with the promise of enabling autonomous data collection, it remains challenging to scale in the real-w
Externí odkaz:
http://arxiv.org/abs/2411.01813
While success in many robotics tasks can be determined by only observing the final state and how it differs from the initial state - e.g., if an apple is picked up - many tasks require observing the full motion of the robot to correctly determine suc
Externí odkaz:
http://arxiv.org/abs/2409.10683
Increasingly large imitation learning datasets are being collected with the goal of training foundation models for robotics. However, despite the fact that data selection has been of utmost importance in vision and natural language processing, little
Externí odkaz:
http://arxiv.org/abs/2408.14037
Autor:
Dziubański, Jacek, Hejna, Agnieszka
Let $\{P_t\}_{t>0}$ be the Dunkl-Poisson semigroup associated with a root system $R\subset \mathbb R^N$ and a multiplicity function $k\geq 0$. Analogously to the classical theory, we say that a bounded measurable function $f$ defined on $\mathbb R^N$
Externí odkaz:
http://arxiv.org/abs/2408.12399
Autor:
Rafailov, Rafael, Chittepu, Yaswanth, Park, Ryan, Sikchi, Harshit, Hejna, Joey, Knox, Bradley, Finn, Chelsea, Niekum, Scott
Reinforcement Learning from Human Feedback (RLHF) has been crucial to the recent success of Large Language Models (LLMs), however, it is often a complex and brittle process. In the classical RLHF framework, a reward model is first trained to represen
Externí odkaz:
http://arxiv.org/abs/2406.02900
Autor:
Bao, Yujia, Shah, Ankit Parag, Narang, Neeru, Rivers, Jonathan, Maksey, Rajeev, Guan, Lan, Barrere, Louise N., Evenson, Shelley, Basole, Rahul, Miao, Connie, Mehta, Ankit, Boulay, Fabien, Park, Su Min, Pearson, Natalie E., Joy, Eldhose, He, Tiger, Thakur, Sumiran, Ghosal, Koustav, On, Josh, Morrison, Phoebe, Major, Tim, Wang, Eva Siqi, Escobar, Gina, Wei, Jiaheng, Weerasooriya, Tharindu Cyril, Song, Queena, Lashkevich, Daria, Chen, Clare, Kim, Gyuhak, Yin, Dengpan, Hejna, Don, Nomeli, Mo, Wei, Wei
This paper introduces Fortune Analytics Language Model (FALM). FALM empowers users with direct access to comprehensive business analysis, including market trends, company performance metrics, and expert insights. Unlike generic LLMs, FALM leverages a
Externí odkaz:
http://arxiv.org/abs/2406.06559
Language models are aligned to emulate the collective voice of many, resulting in outputs that align with no one in particular. Steering LLMs away from generic output is possible through supervised finetuning or RLHF, but requires prohibitively large
Externí odkaz:
http://arxiv.org/abs/2406.00888
Autor:
Octo Model Team, Ghosh, Dibya, Walke, Homer, Pertsch, Karl, Black, Kevin, Mees, Oier, Dasari, Sudeep, Hejna, Joey, Kreiman, Tobias, Xu, Charles, Luo, Jianlan, Tan, You Liang, Chen, Lawrence Yunliang, Sanketi, Pannag, Vuong, Quan, Xiao, Ted, Sadigh, Dorsa, Finn, Chelsea, Levine, Sergey
Large policies pretrained on diverse robot datasets have the potential to transform robotic learning: instead of training new policies from scratch, such generalist robot policies may be finetuned with only a little in-domain data, yet generalize bro
Externí odkaz:
http://arxiv.org/abs/2405.12213
Reinforcement Learning From Human Feedback (RLHF) has been critical to the success of the latest generation of generative AI models. In response to the complex nature of the classical RLHF pipeline, direct alignment algorithms such as Direct Preferen
Externí odkaz:
http://arxiv.org/abs/2404.12358