Zobrazeno 1 - 10
of 31
pro vyhledávání: '"Baker, Bowen"'
Autor:
Burns, Collin, Izmailov, Pavel, Kirchner, Jan Hendrik, Baker, Bowen, Gao, Leo, Aschenbrenner, Leopold, Chen, Yining, Ecoffet, Adrien, Joglekar, Manas, Leike, Jan, Sutskever, Ilya, Wu, Jeff
Widely used alignment techniques, such as reinforcement learning from human feedback (RLHF), rely on the ability of humans to supervise model behavior - for example, to evaluate whether a model faithfully followed instructions or generated safe outpu
Externí odkaz:
http://arxiv.org/abs/2312.09390
Autor:
Lightman, Hunter, Kosaraju, Vineet, Burda, Yura, Edwards, Harri, Baker, Bowen, Lee, Teddy, Leike, Jan, Schulman, John, Sutskever, Ilya, Cobbe, Karl
In recent years, large language models have greatly improved in their ability to perform complex multi-step reasoning. However, even state-of-the-art models still regularly produce logical mistakes. To train more reliable models, we can turn either t
Externí odkaz:
http://arxiv.org/abs/2305.20050
Autor:
Baker, Bowen, Akkaya, Ilge, Zhokhov, Peter, Huizinga, Joost, Tang, Jie, Ecoffet, Adrien, Houghton, Brandon, Sampedro, Raul, Clune, Jeff
Pretraining on noisy, internet-scale datasets has been heavily studied as a technique for training models with broad, general capabilities for text, images, and other modalities. However, for many sequential decision domains such as robotics, video g
Externí odkaz:
http://arxiv.org/abs/2206.11795
Autor:
Kanitscheider, Ingmar, Huizinga, Joost, Farhi, David, Guss, William Hebgen, Houghton, Brandon, Sampedro, Raul, Zhokhov, Peter, Baker, Bowen, Ecoffet, Adrien, Tang, Jie, Klimov, Oleg, Clune, Jeff
An important challenge in reinforcement learning is training agents that can solve a wide variety of tasks. If tasks depend on each other (e.g. needing to learn to walk before learning to run), curriculum learning can speed up learning by focusing on
Externí odkaz:
http://arxiv.org/abs/2106.14876
Autor:
Baker, Bowen
Multi-agent reinforcement learning (MARL) has shown recent success in increasingly complex fixed-team zero-sum environments. However, the real world is not zero-sum nor does it have fixed teams; humans face numerous social dilemmas and must learn whe
Externí odkaz:
http://arxiv.org/abs/2011.05373
Autor:
Baker, Bowen, Kanitscheider, Ingmar, Markov, Todor, Wu, Yi, Powell, Glenn, McGrew, Bob, Mordatch, Igor
Through multi-agent competition, the simple objective of hide-and-seek, and standard reinforcement learning algorithms at scale, we find that agents create a self-supervised autocurriculum inducing multiple distinct rounds of emergent strategy, many
Externí odkaz:
http://arxiv.org/abs/1909.07528
Autor:
Baker, Bowen
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and
Externí odkaz:
http://hdl.handle.net/1721.1/119511
Autor:
OpenAI, Andrychowicz, Marcin, Baker, Bowen, Chociej, Maciek, Jozefowicz, Rafal, McGrew, Bob, Pachocki, Jakub, Petron, Arthur, Plappert, Matthias, Powell, Glenn, Ray, Alex, Schneider, Jonas, Sidor, Szymon, Tobin, Josh, Welinder, Peter, Weng, Lilian, Zaremba, Wojciech
We use reinforcement learning (RL) to learn dexterous in-hand manipulation policies which can perform vision-based object reorientation on a physical Shadow Dexterous Hand. The training is performed in a simulated environment in which we randomize ma
Externí odkaz:
http://arxiv.org/abs/1808.00177
Autor:
Plappert, Matthias, Andrychowicz, Marcin, Ray, Alex, McGrew, Bob, Baker, Bowen, Powell, Glenn, Schneider, Jonas, Tobin, Josh, Chociej, Maciek, Welinder, Peter, Kumar, Vikash, Zaremba, Wojciech
The purpose of this technical report is two-fold. First of all, it introduces a suite of challenging continuous control tasks (integrated with OpenAI Gym) based on currently existing robotics hardware. The tasks include pushing, sliding and pick & pl
Externí odkaz:
http://arxiv.org/abs/1802.09464
Methods for neural network hyperparameter optimization and meta-modeling are computationally expensive due to the need to train a large number of model configurations. In this paper, we show that standard frequentist regression models can predict the
Externí odkaz:
http://arxiv.org/abs/1705.10823