Zobrazeno 1 - 10
of 24
pro vyhledávání: '"Bechtle, Sarah"'
Autor:
Wulfmeier, Markus, Bloesch, Michael, Vieillard, Nino, Ahuja, Arun, Bornschein, Jorg, Huang, Sandy, Sokolov, Artem, Barnes, Matt, Desjardins, Guillaume, Bewley, Alex, Bechtle, Sarah Maria Elisabeth, Springenberg, Jost Tobias, Momchev, Nikola, Bachem, Olivier, Geist, Matthieu, Riedmiller, Martin
The majority of language model training builds on imitation learning. It covers pretraining, supervised fine-tuning, and affects the starting conditions for reinforcement learning from human feedback (RLHF). The simplicity and scalability of maximum
Externí odkaz:
http://arxiv.org/abs/2409.01369
Autor:
Bruce, Jake, Dennis, Michael, Edwards, Ashley, Parker-Holder, Jack, Shi, Yuge, Hughes, Edward, Lai, Matthew, Mavalankar, Aditi, Steigerwald, Richie, Apps, Chris, Aytar, Yusuf, Bechtle, Sarah, Behbahani, Feryal, Chan, Stephanie, Heess, Nicolas, Gonzalez, Lucy, Osindero, Simon, Ozair, Sherjil, Reed, Scott, Zhang, Jingwei, Zolna, Konrad, Clune, Jeff, de Freitas, Nando, Singh, Satinder, Rocktäschel, Tim
We introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos. The model can be prompted to generate an endless variety of action-controllable virtual worlds described through text,
Externí odkaz:
http://arxiv.org/abs/2402.15391
Autor:
Springenberg, Jost Tobias, Abdolmaleki, Abbas, Zhang, Jingwei, Groth, Oliver, Bloesch, Michael, Lampe, Thomas, Brakel, Philemon, Bechtle, Sarah, Kapturowski, Steven, Hafner, Roland, Heess, Nicolas, Riedmiller, Martin
We show that offline actor-critic reinforcement learning can scale to large models - such as transformers - and follows similar scaling laws as supervised learning. We find that offline actor-critic algorithms can outperform strong, supervised, behav
Externí odkaz:
http://arxiv.org/abs/2402.05546
Autor:
Lampe, Thomas, Abdolmaleki, Abbas, Bechtle, Sarah, Huang, Sandy H., Springenberg, Jost Tobias, Bloesch, Michael, Groth, Oliver, Hafner, Roland, Hertweck, Tim, Neunert, Michael, Wulfmeier, Markus, Zhang, Jingwei, Nori, Francesco, Heess, Nicolas, Riedmiller, Martin
Reinforcement learning solely from an agent's self-generated data is often believed to be infeasible for learning on real robots, due to the amount of data needed. However, if done right, agents learning from real data can be surprisingly efficient t
Externí odkaz:
http://arxiv.org/abs/2312.11374
Contemporary artificial intelligence systems exhibit rapidly growing abilities accompanied by the growth of required resources, expansive datasets and corresponding investments into computing infrastructure. Although earlier successes predominantly f
Externí odkaz:
http://arxiv.org/abs/2312.01939
Autor:
Pinneri, Cristina, Bechtle, Sarah, Wulfmeier, Markus, Byravan, Arunkumar, Zhang, Jingwei, Whitney, William F., Riedmiller, Martin
We present a novel approach to address the challenge of generalization in offline reinforcement learning (RL), where the agent learns from a fixed dataset without any additional interaction with the environment. Specifically, we aim to improve the ag
Externí odkaz:
http://arxiv.org/abs/2309.07578
Autor:
Schubert, Ingmar, Zhang, Jingwei, Bruce, Jake, Bechtle, Sarah, Parisotto, Emilio, Riedmiller, Martin, Springenberg, Jost Tobias, Byravan, Arunkumar, Hasenclever, Leonard, Heess, Nicolas
We investigate the use of transformer sequence models as dynamics models (TDMs) for control. We find that TDMs exhibit strong generalization capabilities to unseen environments, both in a few-shot setting, where a generalist TDM is fine-tuned with sm
Externí odkaz:
http://arxiv.org/abs/2305.10912
Being able to seamlessly generalize across different tasks is fundamental for robots to act in our world. However, learning representations that generalize quickly to new scenarios is still an open research problem in reinforcement learning. In this
Externí odkaz:
http://arxiv.org/abs/2204.02210
Inverse reinforcement learning is a paradigm motivated by the goal of learning general reward functions from demonstrated behaviours. Yet the notion of generality for learnt costs is often evaluated in terms of robustness to various spatial perturbat
Externí odkaz:
http://arxiv.org/abs/2107.03186
Humans have impressive generalization capabilities when it comes to manipulating objects and tools in completely novel environments. These capabilities are, at least partially, a result of humans having internal models of their bodies and any grasped
Externí odkaz:
http://arxiv.org/abs/2011.03882