Zobrazeno 1 - 10
of 1 723
pro vyhledávání: '"A. Bagnell"'
Autor:
A. Bagnell, T. DeVries
Publikováno v:
Nature Communications, Vol 12, Iss 1, Pp 1-10 (2021)
Cooling of the global ocean below 2000 m counteracted some of the warming of the shallow ocean over much of the late 20th century. Here the authors show that this trend has shifted to warming, leading the deep ocean to absorb a meaningful fraction of
Externí odkaz:
https://doaj.org/article/32cbc4fbfddd413c98f2d4d9412279f7
Publikováno v:
Biogeosciences, Vol 16, Pp 2617-2633 (2019)
Nitrate is a critical ingredient for life in the ocean because, as the most abundant form of fixed nitrogen in the ocean, it is an essential nutrient for primary production. The availability of marine nitrate is principally determined by biological p
Externí odkaz:
https://doaj.org/article/5a506d5040e74890b0c4151cc1f3b230
We consider the hybrid reinforcement learning setting where the agent has access to both offline data and online interactive access. While Reinforcement Learning (RL) research typically assumes offline data contains complete action, reward and transi
Externí odkaz:
http://arxiv.org/abs/2406.07253
Learning from human preference data has emerged as the dominant paradigm for fine-tuning large language models (LLMs). The two most common families of techniques -- online reinforcement learning (RL) such as Proximal Policy Optimization (PPO) and off
Externí odkaz:
http://arxiv.org/abs/2406.01462
Autor:
Gao, Zhaolin, Chang, Jonathan D., Zhan, Wenhao, Oertell, Owen, Swamy, Gokul, Brantley, Kianté, Joachims, Thorsten, Bagnell, J. Andrew, Lee, Jason D., Sun, Wen
While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications, including the fine-tuning of generative models. Unfortunately, PPO
Externí odkaz:
http://arxiv.org/abs/2404.16767
The inverse reinforcement learning approach to imitation learning is a double-edged sword. On the one hand, it can enable learning from a smaller number of expert demonstrations with more robustness to error compounding than behavioral cloning approa
Externí odkaz:
http://arxiv.org/abs/2402.08848
Inverse Reinforcement Learning (IRL) is a powerful framework for learning complex behaviors from expert demonstrations. However, it traditionally requires repeatedly solving a computationally expensive reinforcement learning (RL) problem in its inner
Externí odkaz:
http://arxiv.org/abs/2402.02616
Inverse Reinforcement Learning (IRL) is a powerful set of techniques for imitation learning that aims to learn a reward function that rationalizes expert demonstrations. Unfortunately, traditional IRL methods suffer from a computational weakness: the
Externí odkaz:
http://arxiv.org/abs/2303.14623
We propose a novel approach to addressing two fundamental challenges in Model-based Reinforcement Learning (MBRL): the computational expense of repeatedly finding a good policy in the learned model, and the objective mismatch between model fitting an
Externí odkaz:
http://arxiv.org/abs/2303.00694
We consider a hybrid reinforcement learning setting (Hybrid RL), in which an agent has access to an offline dataset and the ability to collect experience via real-world online interaction. The framework mitigates the challenges that arise in both pur
Externí odkaz:
http://arxiv.org/abs/2210.06718