Zobrazeno 1 - 10
of 62
pro vyhledávání: '"Mnih, Volodymyr"'
Efficient video tokenization remains a key bottleneck in learning general purpose vision models that are capable of processing long video sequences. Prevailing approaches are restricted to encoding videos to a fixed number of tokens, where too few to
Externí odkaz:
http://arxiv.org/abs/2410.08368
Autor:
Baumli, Kate, Baveja, Satinder, Behbahani, Feryal, Chan, Harris, Comanici, Gheorghe, Flennerhag, Sebastian, Gazeau, Maxime, Holsheimer, Kristian, Horgan, Dan, Laskin, Michael, Lyle, Clare, Masoom, Hussain, McKinney, Kay, Mnih, Volodymyr, Neitz, Alexander, Nikulin, Dmitry, Pardo, Fabio, Parker-Holder, Jack, Quan, John, Rocktäschel, Tim, Sahni, Himanshu, Schaul, Tom, Schroecker, Yannick, Spencer, Stephen, Steigerwald, Richie, Wang, Luyu, Zhang, Lei
Building generalist agents that can accomplish many goals in rich open-ended environments is one of the research frontiers for reinforcement learning. A key limiting factor for building generalist agents with RL has been the need for a large number o
Externí odkaz:
http://arxiv.org/abs/2312.09187
Autor:
Laskin, Michael, Wang, Luyu, Oh, Junhyuk, Parisotto, Emilio, Spencer, Stephen, Steigerwald, Richie, Strouse, DJ, Hansen, Steven, Filos, Angelos, Brooks, Ethan, Gazeau, Maxime, Sahni, Himanshu, Singh, Satinder, Mnih, Volodymyr
We propose Algorithm Distillation (AD), a method for distilling reinforcement learning (RL) algorithms into neural networks by modeling their training histories with a causal sequence model. Algorithm Distillation treats learning to reinforcement lea
Externí odkaz:
http://arxiv.org/abs/2210.14215
Large and diverse datasets have been the cornerstones of many impressive advancements in artificial intelligence. Intelligent creatures, however, learn by interacting with the environment, which changes the input sensory signals and the state of the
Externí odkaz:
http://arxiv.org/abs/2210.10913
This paper deals with the problem of learning a skill-conditioned policy that acts meaningfully in the absence of a reward signal. Mutual information based objectives have shown some success in learning skills that reach a diverse set of states in th
Externí odkaz:
http://arxiv.org/abs/2110.15331
Autor:
Zahavy, Tom, O'Donoghue, Brendan, Barreto, Andre, Mnih, Volodymyr, Flennerhag, Sebastian, Singh, Satinder
Finding different solutions to the same problem is a key aspect of intelligence associated with creativity and adaptation to novel situations. In reinforcement learning, a set of diverse policies can be useful for exploration, transfer, hierarchy, an
Externí odkaz:
http://arxiv.org/abs/2106.00669
In the absence of external rewards, agents can still learn useful behaviors by identifying and mastering a set of diverse skills within their environment. Existing skill learning methods use mutual information objectives to incentivize each skill to
Externí odkaz:
http://arxiv.org/abs/2012.07827
Applying Q-learning to high-dimensional or continuous action spaces can be difficult due to the required maximization over the set of possible actions. Motivated by techniques from amortized inference, we replace the expensive maximization over all a
Externí odkaz:
http://arxiv.org/abs/2001.08116
Autor:
Kulkarni, Tejas, Gupta, Ankush, Ionescu, Catalin, Borgeaud, Sebastian, Reynolds, Malcolm, Zisserman, Andrew, Mnih, Volodymyr
The study of object representations in computer vision has primarily focused on developing representations that are useful for image classification, object detection, or semantic segmentation as downstream tasks. In this work we aim to learn object r
Externí odkaz:
http://arxiv.org/abs/1906.11883
Autor:
Hansen, Steven, Dabney, Will, Barreto, Andre, Van de Wiele, Tom, Warde-Farley, David, Mnih, Volodymyr
It has been established that diverse behaviors spanning the controllable subspace of an Markov decision process can be trained by rewarding a policy for being distinguishable from other policies \citep{gregor2016variational, eysenbach2018diversity, w
Externí odkaz:
http://arxiv.org/abs/1906.05030