Zobrazeno 1 - 10
of 35
pro vyhledávání: '"Chandak, Yash P."'
From incorporating LLMs in education, to identifying new drugs and improving ways to charge batteries, innovators constantly try new strategies in search of better long-term outcomes for students, patients and consumers. One major bottleneck in this
Externí odkaz:
http://arxiv.org/abs/2407.03674
Autor:
Grinsztajn, Nathan, Flet-Berliac, Yannis, Azar, Mohammad Gheshlaghi, Strub, Florian, Wu, Bill, Choi, Eugene, Cremer, Chris, Ahmadian, Arash, Chandak, Yash, Pietquin, Olivier, Geist, Matthieu
To better align Large Language Models (LLMs) with human judgment, Reinforcement Learning from Human Feedback (RLHF) learns a reward model and then optimizes it using regularized RL. Recently, direct alignment methods were introduced to learn such a f
Externí odkaz:
http://arxiv.org/abs/2406.19188
Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion
Autor:
Flet-Berliac, Yannis, Grinsztajn, Nathan, Strub, Florian, Choi, Eugene, Cremer, Chris, Ahmadian, Arash, Chandak, Yash, Azar, Mohammad Gheshlaghi, Pietquin, Olivier, Geist, Matthieu
Reinforcement Learning (RL) has been used to finetune Large Language Models (LLMs) using a reward model trained from preference data, to better align with human judgment. The recently introduced direct alignment methods, which are often simpler, more
Externí odkaz:
http://arxiv.org/abs/2406.19185
Autor:
Nie, Allen, Chandak, Yash, Yuan, Christina J., Badrinath, Anirudhan, Flet-Berliac, Yannis, Brunskil, Emma
Offline policy evaluation (OPE) allows us to evaluate and estimate a new sequential decision-making policy's performance by leveraging historical interaction data collected from other policies. Evaluating a new policy online without a confident estim
Externí odkaz:
http://arxiv.org/abs/2405.17708
Autor:
Nie, Allen, Chandak, Yash, Suzara, Miroslav, Ali, Malika, Woodrow, Juliette, Peng, Matt, Sahami, Mehran, Brunskill, Emma, Piech, Chris
Large language models (LLMs) are quickly being adopted in a wide range of learning experiences, especially via ubiquitous and broadly accessible chat interfaces like ChatGPT and Copilot. This type of interface is readily available to students and tea
Externí odkaz:
http://arxiv.org/abs/2407.09975
A/B tests are often required to be conducted on subjects that might have social connections. For e.g., experiments on social media, or medical and social interventions to control the spread of an epidemic. In such settings, the SUTVA assumption for r
Externí odkaz:
http://arxiv.org/abs/2404.10547
Indirect experiments provide a valuable framework for estimating treatment effects in situations where conducting randomized control trials (RCTs) is impractical or unethical. Unlike RCTs, indirect experiments estimate treatment effects by leveraging
Externí odkaz:
http://arxiv.org/abs/2312.02438
Designing reward functions for efficiently guiding reinforcement learning (RL) agents toward specific behaviors is a complex task. This is challenging since it requires the identification of reward structures that are not sparse and that avoid inadve
Externí odkaz:
http://arxiv.org/abs/2310.19007
Autor:
Lee, Jonathan N., Xie, Annie, Pacchiano, Aldo, Chandak, Yash, Finn, Chelsea, Nachum, Ofir, Brunskill, Emma
Large transformer models trained on diverse datasets have shown a remarkable ability to learn in-context, achieving high few-shot performance on tasks they were not explicitly trained to solve. In this paper, we study the in-context learning capabili
Externí odkaz:
http://arxiv.org/abs/2306.14892
Autor:
Kostas, James E., Jordan, Scott M., Chandak, Yash, Theocharous, Georgios, Gupta, Dhawal, White, Martha, da Silva, Bruno Castro, Thomas, Philip S.
Coagent networks for reinforcement learning (RL) [Thomas and Barto, 2011] provide a powerful and flexible framework for deriving principled learning rules for arbitrary stochastic neural networks. The coagent framework offers an alternative to backpr
Externí odkaz:
http://arxiv.org/abs/2305.09838