Zobrazeno 1 - 10
of 561
pro vyhledávání: '"Singh, Satinder P."'
We propose a blind ML-based modulation detection for OFDM-based technologies. Unlike previous works that assume an ideal environment with precise knowledge of subcarrier count and cyclic prefix location, we consider blind modulation detection while a
Externí odkaz:
http://arxiv.org/abs/2408.08179
Autor:
Bruce, Jake, Dennis, Michael, Edwards, Ashley, Parker-Holder, Jack, Shi, Yuge, Hughes, Edward, Lai, Matthew, Mavalankar, Aditi, Steigerwald, Richie, Apps, Chris, Aytar, Yusuf, Bechtle, Sarah, Behbahani, Feryal, Chan, Stephanie, Heess, Nicolas, Gonzalez, Lucy, Osindero, Simon, Ozair, Sherjil, Reed, Scott, Zhang, Jingwei, Zolna, Konrad, Clune, Jeff, de Freitas, Nando, Singh, Satinder, Rocktäschel, Tim
We introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos. The model can be prompted to generate an endless variety of action-controllable virtual worlds described through text,
Externí odkaz:
http://arxiv.org/abs/2402.15391
Autor:
Carvalho, Wilka, Saraiva, Andre, Filos, Angelos, Lampinen, Andrew Kyle, Matthey, Loic, Lewis, Richard L., Lee, Honglak, Singh, Satinder, Rezende, Danilo J., Zoran, Daniel
The Option Keyboard (OK) was recently proposed as a method for transferring behavioral knowledge across tasks. OK transfers knowledge by adaptively combining subsets of known behaviors using Successor Features (SFs) and Generalized Policy Improvement
Externí odkaz:
http://arxiv.org/abs/2310.15940
Autor:
Zahavy, Tom, Veeriah, Vivek, Hou, Shaobo, Waugh, Kevin, Lai, Matthew, Leurent, Edouard, Tomasev, Nenad, Schut, Lisa, Hassabis, Demis, Singh, Satinder
In recent years, Artificial Intelligence (AI) systems have surpassed human intelligence in a variety of computational tasks. However, AI systems, like humans, make mistakes, have blind spots, hallucinate, and struggle to generalize to new situations.
Externí odkaz:
http://arxiv.org/abs/2308.09175
Autor:
Abel, David, Barreto, André, Van Roy, Benjamin, Precup, Doina, van Hasselt, Hado, Singh, Satinder
In a standard view of the reinforcement learning problem, an agent's goal is to efficiently identify a policy that maximizes long-term reward. However, this perspective is based on a restricted view of learning as finding a solution, rather than trea
Externí odkaz:
http://arxiv.org/abs/2307.11046
Autor:
Abel, David, Barreto, André, van Hasselt, Hado, Van Roy, Benjamin, Precup, Doina, Singh, Satinder
When has an agent converged? Standard models of the reinforcement learning problem give rise to a straightforward definition of convergence: An agent converges when its behavior or performance in each environment state stops changing. However, as we
Externí odkaz:
http://arxiv.org/abs/2307.11044
Autor:
Lu, Chris, Schroecker, Yannick, Gu, Albert, Parisotto, Emilio, Foerster, Jakob, Singh, Satinder, Behbahani, Feryal
Structured state space sequence (S4) models have recently achieved state-of-the-art performance on long-range sequence modeling tasks. These models also have fast inference speeds and parallelisable training, making them potentially useful in many re
Externí odkaz:
http://arxiv.org/abs/2303.03982
Autor:
Pires, Bernardo Avila, Behbahani, Feryal, Soyer, Hubert, Nikiforou, Kyriacos, Keck, Thomas, Singh, Satinder
Hierarchical Reinforcement Learning (HRL) agents have the potential to demonstrate appealing capabilities such as planning and exploration with abstraction, transfer, and skill reuse. Recent successes with HRL across different domains provide evidenc
Externí odkaz:
http://arxiv.org/abs/2302.14451
Autor:
Moskovitz, Ted, O'Donoghue, Brendan, Veeriah, Vivek, Flennerhag, Sebastian, Singh, Satinder, Zahavy, Tom
In recent years, Reinforcement Learning (RL) has been applied to real-world problems with increasing success. Such applications often require to put constraints on the agent's behavior. Existing algorithms for constrained RL (CRL) rely on gradient de
Externí odkaz:
http://arxiv.org/abs/2302.01275
Recently, the Successor Features and Generalized Policy Improvement (SF&GPI) framework has been proposed as a method for learning, composing, and transferring predictive knowledge and behavior. SF&GPI works by having an agent learn predictive represe
Externí odkaz:
http://arxiv.org/abs/2301.12305