Zobrazeno 1 - 10
of 107
pro vyhledávání: '"Harb, Jean"'
Autor:
Gulino, Cole, Fu, Justin, Luo, Wenjie, Tucker, George, Bronstein, Eli, Lu, Yiren, Harb, Jean, Pan, Xinlei, Wang, Yan, Chen, Xiangyu, Co-Reyes, John D., Agarwal, Rishabh, Roelofs, Rebecca, Lu, Yao, Montali, Nico, Mougin, Paul, Yang, Zoey, White, Brandyn, Faust, Aleksandra, McAllister, Rowan, Anguelov, Dragomir, Sapp, Benjamin
Simulation is an essential tool to develop and benchmark autonomous vehicle planning software in a safe and cost-effective manner. However, realistic simulation requires accurate modeling of nuanced and complex multi-agent interactive behaviors. To a
Externí odkaz:
http://arxiv.org/abs/2310.08710
Learning to evaluate and improve policies is a core problem of Reinforcement Learning (RL). Traditional RL algorithms learn a value function defined for a single policy. A recently explored competitive alternative is to learn a single value function
Externí odkaz:
http://arxiv.org/abs/2207.01566
Many reinforcement learning algorithms use value functions to guide the search for better policies. These methods estimate the value of a single policy while generalizing across many states. The core idea of this paper is to flip this convention and
Externí odkaz:
http://arxiv.org/abs/2002.11833
Autor:
Schaul, Tom, van Hasselt, Hado, Modayil, Joseph, White, Martha, White, Adam, Bacon, Pierre-Luc, Harb, Jean, Mourad, Shibl, Bellemare, Marc, Precup, Doina
We want to make progress toward artificial general intelligence, namely general-purpose agents that autonomously learn how to competently act in complex environments. The purpose of this report is to sketch a research outline, share some of the most
Externí odkaz:
http://arxiv.org/abs/1811.07004
We present new results on learning temporally extended actions for continuoustasks, using the options framework (Suttonet al.[1999b], Precup [2000]). In orderto achieve this goal we work with the option-critic architecture (Baconet al.[2017])using a
Externí odkaz:
http://arxiv.org/abs/1712.00004
Recent work has shown that temporally extended actions (options) can be learned fully end-to-end as opposed to being specified in advance. While the problem of "how" to learn options is increasingly well understood, the question of "what" good option
Externí odkaz:
http://arxiv.org/abs/1709.04571
We explore deep reinforcement learning methods for multi-agent domains. We begin by analyzing the difficulty of traditional algorithms in the multi-agent case: Q-learning is challenged by an inherent non-stationarity of the environment, while policy
Externí odkaz:
http://arxiv.org/abs/1706.02275
Autor:
Harb, Jean, Precup, Doina
Eligibility traces in reinforcement learning are used as a bias-variance trade-off and can often speed up training time by propagating knowledge back over time-steps in a single update. We investigate the use of eligibility traces in combination with
Externí odkaz:
http://arxiv.org/abs/1704.05495
Temporal abstraction is key to scaling up learning and planning in reinforcement learning. While planning with temporally extended actions is well understood, creating such abstractions autonomously from data has remained challenging. We tackle this
Externí odkaz:
http://arxiv.org/abs/1609.05140
Publikováno v:
Frontiers in Immunology; 2023, p1-8, 8p