Výsledky vyhledávání - "Chandak, Yash"

Supervised Pretraining Can Learn In-Context Reinforcement Learning

Autor: Lee, Jonathan N., Xie, Annie, Pacchiano, Aldo, Chandak, Yash, Finn, Chelsea, Nachum, Ofir, Brunskill, Emma

Large transformer models trained on diverse datasets have shown a remarkable ability to learn in-context, achieving high few-shot performance on tasks they were not explicitly trained to solve. In this paper, we study the in-context learning capabili

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::5b4400eba61e9bbcdad1844df9586524
http://arxiv.org/abs/2306.14892

Zobrazit plný text záznamu

Coagent Networks: Generalized and Scaled

Autor: Kostas, James E., Jordan, Scott M., Chandak, Yash, Theocharous, Georgios, Gupta, Dhawal, White, Martha, da Silva, Bruno Castro, Thomas, Philip S.

Coagent networks for reinforcement learning (RL) [Thomas and Barto, 2011] provide a powerful and flexible framework for deriving principled learning rules for arbitrary stochastic neural networks. The coagent framework offers an alternative to backpr

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::c1052ad7e823707ca3dcad2409f0b8f0
http://arxiv.org/abs/2305.09838

Zobrazit plný text záznamu

Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition

Autor: Chandak, Yash, Thakoor, Shantanu, Guo, Zhaohan Daniel, Tang, Yunhao, Munos, Remi, Dabney, Will, Borsa, Diana L

Representation learning and exploration are among the key challenges for any deep reinforcement learning agent. In this work, we provide a singular value decomposition based method that can be used to obtain representations that preserve the underlyi

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::8e049dc56c75b278c82fa44306497334

Zobrazit plný text záznamu

Off-Policy Evaluation for Action-Dependent Non-Stationary Environments

Autor: Chandak, Yash, Shankar, Shiv, Bastian, Nathaniel D., da Silva, Bruno Castro, Brunskil, Emma, Thomas, Philip S.

Methods for sequential decision-making are often built upon a foundational assumption that the underlying decision process is stationary. This limits the application of such methods because real-world problems are often subject to changes due to exte

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::52856c68ea94985379bd5db39b76260d

Zobrazit plný text záznamu

Understanding Self-Predictive Learning for Reinforcement Learning

Autor: Tang, Yunhao, Guo, Zhaohan Daniel, Richemond, Pierre Harvey, Pires, Bernardo Ávila, Chandak, Yash, Munos, Rémi, Rowland, Mark, Azar, Mohammad Gheshlaghi, Lan, Charline Le, Lyle, Clare, György, András, Thakoor, Shantanu, Dabney, Will, Piot, Bilal, Calandriello, Daniele, Valko, Michal

We study the learning dynamics of self-predictive learning for reinforcement learning, a family of algorithms that learn representations by minimizing the prediction error of their own future latent representations. Despite its recent empirical succe

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::3c9c957441083131b4a6c939b4480458

Zobrazit plný text záznamu

Towards Safe Policy Improvement for Non-Stationary MDPs

Autor: Chandak, Yash, Jordan, Scott M., Theocharous, Georgios, White, Martha, Thomas, Philip S.

Many real-world sequential decision-making problems involve critical systems with financial risks and human-life risks. While several works in the past have proposed methods that are safe for deployment, they assume that the underlying problem is sta

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::cbbe2b153e084a687d46c40beb982700

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání