Zobrazeno 1 - 10
of 24
pro vyhledávání: '"Sahni, Himanshu"'
Autor:
Baumli, Kate, Baveja, Satinder, Behbahani, Feryal, Chan, Harris, Comanici, Gheorghe, Flennerhag, Sebastian, Gazeau, Maxime, Holsheimer, Kristian, Horgan, Dan, Laskin, Michael, Lyle, Clare, Masoom, Hussain, McKinney, Kay, Mnih, Volodymyr, Neitz, Alexander, Nikulin, Dmitry, Pardo, Fabio, Parker-Holder, Jack, Quan, John, Rocktäschel, Tim, Sahni, Himanshu, Schaul, Tom, Schroecker, Yannick, Spencer, Stephen, Steigerwald, Richie, Wang, Luyu, Zhang, Lei
Building generalist agents that can accomplish many goals in rich open-ended environments is one of the research frontiers for reinforcement learning. A key limiting factor for building generalist agents with RL has been the need for a large number o
Externí odkaz:
http://arxiv.org/abs/2312.09187
Autor:
Laskin, Michael, Wang, Luyu, Oh, Junhyuk, Parisotto, Emilio, Spencer, Stephen, Steigerwald, Richie, Strouse, DJ, Hansen, Steven, Filos, Angelos, Brooks, Ethan, Gazeau, Maxime, Sahni, Himanshu, Singh, Satinder, Mnih, Volodymyr
We propose Algorithm Distillation (AD), a method for distilling reinforcement learning (RL) algorithms into neural networks by modeling their training histories with a causal sequence model. Algorithm Distillation treats learning to reinforcement lea
Externí odkaz:
http://arxiv.org/abs/2210.14215
Autor:
Sahni, Himanshu, Isbell, Charles
Biological agents have adopted the principle of attention to limit the rate of incoming information from the environment. One question that arises is if an artificial agent has access to only a limited view of its surroundings, how can it control its
Externí odkaz:
http://arxiv.org/abs/2103.06371
Autor:
Edwards, Ashley D., Sahni, Himanshu, Liu, Rosanne, Hung, Jane, Jain, Ankit, Wang, Rui, Ecoffet, Adrien, Miconi, Thomas, Isbell, Charles, Yosinski, Jason
In this paper, we introduce a novel form of value function, $Q(s, s')$, that expresses the utility of transitioning from a state $s$ to a neighboring state $s'$ and then acting optimally thereafter. In order to derive an optimal policy, we develop a
Externí odkaz:
http://arxiv.org/abs/2002.09505
Reinforcement Learning (RL) algorithms typically require millions of environment interactions to learn successful policies in sparse reward settings. Hindsight Experience Replay (HER) was introduced as a technique to increase sample efficiency by rei
Externí odkaz:
http://arxiv.org/abs/1901.11529
In this paper, we describe a novel approach to imitation learning that infers latent policies directly from state observations. We introduce a method that characterizes the causal effects of latent actions on observations while simultaneously predict
Externí odkaz:
http://arxiv.org/abs/1805.07914
We present a differentiable framework capable of learning a wide variety of compositions of simple policies that we call skills. By recursively composing skills with themselves, we can create hierarchies that display complex behavior. Skill networks
Externí odkaz:
http://arxiv.org/abs/1711.11289
Typical reinforcement learning (RL) agents learn to complete tasks specified by reward functions tailored to their domain. As such, the policies they learn do not generalize even to similar domains. To address this issue, we develop a framework throu
Externí odkaz:
http://arxiv.org/abs/1705.08997
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.
Publikováno v:
2015 11th IEEE International Conference & Workshops on Automatic Face & Gesture Recognition (FG); 2015, p1-7, 7p