Zobrazeno 1 - 10
of 96
pro vyhledávání: '"Springenberg, Jost Tobias"'
Autor:
Abdolmaleki, Abbas, Piot, Bilal, Shahriari, Bobak, Springenberg, Jost Tobias, Hertweck, Tim, Joshi, Rishabh, Oh, Junhyuk, Bloesch, Michael, Lampe, Thomas, Heess, Nicolas, Buchli, Jonas, Riedmiller, Martin
Existing preference optimization methods are mainly designed for directly learning from human feedback with the assumption that paired examples (preferred vs. dis-preferred) are available. In contrast, we propose a method that can leverage unpaired p
Externí odkaz:
http://arxiv.org/abs/2410.04166
Autor:
Zhang, Jingwei, Lampe, Thomas, Abdolmaleki, Abbas, Springenberg, Jost Tobias, Riedmiller, Martin
We propose an agent architecture that automates parts of the common reinforcement learning experiment workflow, to enable automated mastery of control domains for embodied agents. To do so, it leverages a VLM to perform some of the capabilities norma
Externí odkaz:
http://arxiv.org/abs/2409.03402
Autor:
Wulfmeier, Markus, Bloesch, Michael, Vieillard, Nino, Ahuja, Arun, Bornschein, Jorg, Huang, Sandy, Sokolov, Artem, Barnes, Matt, Desjardins, Guillaume, Bewley, Alex, Bechtle, Sarah Maria Elisabeth, Springenberg, Jost Tobias, Momchev, Nikola, Bachem, Olivier, Geist, Matthieu, Riedmiller, Martin
The majority of language model training builds on imitation learning. It covers pretraining, supervised fine-tuning, and affects the starting conditions for reinforcement learning from human feedback (RLHF). The simplicity and scalability of maximum
Externí odkaz:
http://arxiv.org/abs/2409.01369
Autor:
Springenberg, Jost Tobias, Abdolmaleki, Abbas, Zhang, Jingwei, Groth, Oliver, Bloesch, Michael, Lampe, Thomas, Brakel, Philemon, Bechtle, Sarah, Kapturowski, Steven, Hafner, Roland, Heess, Nicolas, Riedmiller, Martin
We show that offline actor-critic reinforcement learning can scale to large models - such as transformers - and follows similar scaling laws as supervised learning. We find that offline actor-critic algorithms can outperform strong, supervised, behav
Externí odkaz:
http://arxiv.org/abs/2402.05546
Autor:
Lampe, Thomas, Abdolmaleki, Abbas, Bechtle, Sarah, Huang, Sandy H., Springenberg, Jost Tobias, Bloesch, Michael, Groth, Oliver, Hafner, Roland, Hertweck, Tim, Neunert, Michael, Wulfmeier, Markus, Zhang, Jingwei, Nori, Francesco, Heess, Nicolas, Riedmiller, Martin
Reinforcement learning solely from an agent's self-generated data is often believed to be infeasible for learning on real robots, due to the amount of data needed. However, if done right, agents learning from real data can be surprisingly efficient t
Externí odkaz:
http://arxiv.org/abs/2312.11374
Autor:
Bousmalis, Konstantinos, Vezzani, Giulia, Rao, Dushyant, Devin, Coline, Lee, Alex X., Bauza, Maria, Davchev, Todor, Zhou, Yuxiang, Gupta, Agrim, Raju, Akhil, Laurens, Antoine, Fantacci, Claudio, Dalibard, Valentin, Zambelli, Martina, Martins, Murilo, Pevceviciute, Rugile, Blokzijl, Michiel, Denil, Misha, Batchelor, Nathan, Lampe, Thomas, Parisotto, Emilio, Żołna, Konrad, Reed, Scott, Colmenarejo, Sergio Gómez, Scholz, Jon, Abdolmaleki, Abbas, Groth, Oliver, Regli, Jean-Baptiste, Sushkov, Oleg, Rothörl, Tom, Chen, José Enrique, Aytar, Yusuf, Barker, Dave, Ortiz, Joy, Riedmiller, Martin, Springenberg, Jost Tobias, Hadsell, Raia, Nori, Francesco, Heess, Nicolas
The ability to leverage heterogeneous robotic experience from different robots and tasks to quickly master novel skills and embodiments has the potential to transform robot learning. Inspired by recent advances in foundation models for vision and lan
Externí odkaz:
http://arxiv.org/abs/2306.11706
Autor:
Schubert, Ingmar, Zhang, Jingwei, Bruce, Jake, Bechtle, Sarah, Parisotto, Emilio, Riedmiller, Martin, Springenberg, Jost Tobias, Byravan, Arunkumar, Hasenclever, Leonard, Heess, Nicolas
We investigate the use of transformer sequence models as dynamics models (TDMs) for control. We find that TDMs exhibit strong generalization capabilities to unseen environments, both in a few-shot setting, where a generalist TDM is fine-tuned with sm
Externí odkaz:
http://arxiv.org/abs/2305.10912
Autor:
Zhang, Jingwei, Springenberg, Jost Tobias, Byravan, Arunkumar, Hasenclever, Leonard, Abdolmaleki, Abbas, Rao, Dushyant, Heess, Nicolas, Riedmiller, Martin
In this paper we study the problem of learning multi-step dynamics prediction models (jumpy models) from unlabeled experience and their utility for fast inference of (high-level) plans in downstream tasks. In particular we propose to learn a jumpy mo
Externí odkaz:
http://arxiv.org/abs/2302.12617
Autor:
Reed, Scott, Zolna, Konrad, Parisotto, Emilio, Colmenarejo, Sergio Gomez, Novikov, Alexander, Barth-Maron, Gabriel, Gimenez, Mai, Sulsky, Yury, Kay, Jackie, Springenberg, Jost Tobias, Eccles, Tom, Bruce, Jake, Razavi, Ali, Edwards, Ashley, Heess, Nicolas, Chen, Yutian, Hadsell, Raia, Vinyals, Oriol, Bordbar, Mahyar, de Freitas, Nando
Publikováno v:
Transactions on Machine Learning Research, 11/2022, https://openreview.net/forum?id=1ikK0kHjvj
Inspired by progress in large-scale language modeling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs. The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi-embodiment
Externí odkaz:
http://arxiv.org/abs/2205.06175
Autor:
Lee, Alex X., Devin, Coline, Springenberg, Jost Tobias, Zhou, Yuxiang, Lampe, Thomas, Abdolmaleki, Abbas, Bousmalis, Konstantinos
Reinforcement learning (RL) has been shown to be effective at learning control from experience. However, RL typically requires a large amount of online interaction with the environment. This limits its applicability to real-world settings, such as in
Externí odkaz:
http://arxiv.org/abs/2205.03353