Zobrazeno 1 - 9
of 9
pro vyhledávání: '"Tajwar, Fahim"'
Autor:
Tajwar, Fahim, Singh, Anikait, Sharma, Archit, Rafailov, Rafael, Schneider, Jeff, Xie, Tengyang, Ermon, Stefano, Finn, Chelsea, Kumar, Aviral
Learning from preference labels plays a crucial role in fine-tuning large language models. There are several distinct approaches for preference fine-tuning, including supervised learning, on-policy reinforcement learning (RL), and contrastive learnin
Externí odkaz:
http://arxiv.org/abs/2404.14367
Autor:
Mark, Max Sobol, Sharma, Archit, Tajwar, Fahim, Rafailov, Rafael, Levine, Sergey, Finn, Chelsea
It is desirable for policies to optimistically explore new states and behaviors during online reinforcement learning (RL) or fine-tuning, especially when prior offline data does not provide enough state coverage. However, exploration bonuses can bias
Externí odkaz:
http://arxiv.org/abs/2310.08558
In safety-critical applications of machine learning, it is often desirable for a model to be conservative, abstaining from making predictions on unknown inputs which are not well-represented in the training data. However, detecting unknown examples i
Externí odkaz:
http://arxiv.org/abs/2306.04974
Autor:
Lee, Yoonho, Chen, Annie S., Tajwar, Fahim, Kumar, Ananya, Yao, Huaxiu, Liang, Percy, Finn, Chelsea
A common approach to transfer learning under distribution shift is to fine-tune the last few layers of a pre-trained model, preserving learned features while also adapting to the new task. This paper shows that in such settings, selectively fine-tuni
Externí odkaz:
http://arxiv.org/abs/2210.11466
A long-term goal of reinforcement learning is to design agents that can autonomously interact and learn in the world. A critical challenge to such autonomy is the presence of irreversible states which require external assistance to recover from, such
Externí odkaz:
http://arxiv.org/abs/2210.10765
Autor:
Zhou, Allan, Tajwar, Fahim, Robey, Alexander, Knowles, Tom, Pappas, George J., Hassani, Hamed, Finn, Chelsea
To generalize well, classifiers must learn to be invariant to nuisance transformations that do not alter an input's class. Many problems have "class-agnostic" nuisance transformations that apply similarly to all classes, such as lighting and backgrou
Externí odkaz:
http://arxiv.org/abs/2203.09739
Out-of-distribution detection is an important component of reliable ML systems. Prior literature has proposed various methods (e.g., MSP (Hendrycks & Gimpel, 2017), ODIN (Liang et al., 2018), Mahalanobis (Lee et al., 2018)), claiming they are state-o
Externí odkaz:
http://arxiv.org/abs/2109.05554
Autor:
Lee, Jihyeon, Brooks, Nina R., Tajwar, Fahim, Burke, Marshall, Ermon, Stefano, Lobell, David B., Biswas, Debashish, Luby, Stephen P.
Publikováno v:
Proceedings of the National Academy of Sciences of the United States of America, 2021 Apr . 118(17), 1-10.
Externí odkaz:
https://www.jstor.org/stable/27040176
Autor:
Jihyeon Lee, Brooks, Nina R., Tajwar, Fahim, Burke, Marshall, Ermon, Stefano, Lobell, David B., Biswas, Debashish, Luby, Stephen P.
Publikováno v:
Proceedings of the National Academy of Sciences of the United States of America; 4/27/2021, Vol. 118 Issue 17, p1-10, 10p