Výsledky vyhledávání - "Baker, Bowen"

Report

Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision

Autor: Burns, Collin, Izmailov, Pavel, Kirchner, Jan Hendrik, Baker, Bowen, Gao, Leo, Aschenbrenner, Leopold, Chen, Yining, Ecoffet, Adrien, Joglekar, Manas, Leike, Jan, Sutskever, Ilya, Wu, Jeff

Widely used alignment techniques, such as reinforcement learning from human feedback (RLHF), rely on the ability of humans to supervise model behavior - for example, to evaluate whether a model faithfully followed instructions or generated safe outpu

Externí odkaz: http://arxiv.org/abs/2312.09390

Zobrazit plný text záznamu

Report

Let's Verify Step by Step

Autor: Lightman, Hunter, Kosaraju, Vineet, Burda, Yura, Edwards, Harri, Baker, Bowen, Lee, Teddy, Leike, Jan, Schulman, John, Sutskever, Ilya, Cobbe, Karl

In recent years, large language models have greatly improved in their ability to perform complex multi-step reasoning. However, even state-of-the-art models still regularly produce logical mistakes. To train more reliable models, we can turn either t

Externí odkaz: http://arxiv.org/abs/2305.20050

Zobrazit plný text záznamu

Report

Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos

Autor: Baker, Bowen, Akkaya, Ilge, Zhokhov, Peter, Huizinga, Joost, Tang, Jie, Ecoffet, Adrien, Houghton, Brandon, Sampedro, Raul, Clune, Jeff

Pretraining on noisy, internet-scale datasets has been heavily studied as a technique for training models with broad, general capabilities for text, images, and other modalities. However, for many sequential decision domains such as robotics, video g

Externí odkaz: http://arxiv.org/abs/2206.11795

Zobrazit plný text záznamu

Report

Multi-task curriculum learning in a complex, visual, hard-exploration domain: Minecraft

Autor: Kanitscheider, Ingmar, Huizinga, Joost, Farhi, David, Guss, William Hebgen, Houghton, Brandon, Sampedro, Raul, Zhokhov, Peter, Baker, Bowen, Ecoffet, Adrien, Tang, Jie, Klimov, Oleg, Clune, Jeff

An important challenge in reinforcement learning is training agents that can solve a wide variety of tasks. If tasks depend on each other (e.g. needing to learn to walk before learning to run), curriculum learning can speed up learning by focusing on

Externí odkaz: http://arxiv.org/abs/2106.14876

Zobrazit plný text záznamu

Report

Emergent Reciprocity and Team Formation from Randomized Uncertain Social Preferences

Autor: Baker, Bowen

Multi-agent reinforcement learning (MARL) has shown recent success in increasingly complex fixed-team zero-sum environments. However, the real world is not zero-sum nor does it have fixed teams; humans face numerous social dilemmas and must learn whe

Externí odkaz: http://arxiv.org/abs/2011.05373

Zobrazit plný text záznamu

Report

Emergent Tool Use From Multi-Agent Autocurricula

Autor: Baker, Bowen, Kanitscheider, Ingmar, Markov, Todor, Wu, Yi, Powell, Glenn, McGrew, Bob, Mordatch, Igor

Through multi-agent competition, the simple objective of hide-and-seek, and standard reinforcement learning algorithms at scale, we find that agents create a self-supervised autocurriculum inducing multiple distinct rounds of emergent strategy, many

Externí odkaz: http://arxiv.org/abs/1909.07528

Zobrazit plný text záznamu

Dissertation/ Thesis

Towards practical neural network meta-modeling

Autor: Baker, Bowen

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and

Externí odkaz: http://hdl.handle.net/1721.1/119511

Zobrazit plný text záznamu

Report

Learning Dexterous In-Hand Manipulation

Autor: OpenAI, Andrychowicz, Marcin, Baker, Bowen, Chociej, Maciek, Jozefowicz, Rafal, McGrew, Bob, Pachocki, Jakub, Petron, Arthur, Plappert, Matthias, Powell, Glenn, Ray, Alex, Schneider, Jonas, Sidor, Szymon, Tobin, Josh, Welinder, Peter, Weng, Lilian, Zaremba, Wojciech

We use reinforcement learning (RL) to learn dexterous in-hand manipulation policies which can perform vision-based object reorientation on a physical Shadow Dexterous Hand. The training is performed in a simulated environment in which we randomize ma

Externí odkaz: http://arxiv.org/abs/1808.00177

Zobrazit plný text záznamu

Report

Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research

Autor: Plappert, Matthias, Andrychowicz, Marcin, Ray, Alex, McGrew, Bob, Baker, Bowen, Powell, Glenn, Schneider, Jonas, Tobin, Josh, Chociej, Maciek, Welinder, Peter, Kumar, Vikash, Zaremba, Wojciech

The purpose of this technical report is two-fold. First of all, it introduces a suite of challenging continuous control tasks (integrated with OpenAI Gym) based on currently existing robotics hardware. The tasks include pushing, sliding and pick & pl

Externí odkaz: http://arxiv.org/abs/1802.09464

Zobrazit plný text záznamu

Report

Accelerating Neural Architecture Search using Performance Prediction

Autor: Baker, Bowen, Gupta, Otkrist, Raskar, Ramesh, Naik, Nikhil

Methods for neural network hyperparameter optimization and meta-modeling are computationally expensive due to the need to train a large number of model configurations. In this paper, we show that standard frequentist regression models can predict the

Externí odkaz: http://arxiv.org/abs/1705.10823

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání