Zobrazeno 1 - 4
of 4
pro vyhledávání: '"Chidambaram, Keertana"'
Autor:
Lau, Allison, Choi, Younwoo, Balazadeh, Vahid, Chidambaram, Keertana, Syrgkanis, Vasilis, Krishnan, Rahul G.
Reinforcement Learning from Human Feedback (RLHF) is widely used to align Language Models (LMs) with human preferences. However, existing approaches often neglect individual user preferences, leading to suboptimal personalization. We present the Pref
Externí odkaz:
http://arxiv.org/abs/2410.14001
RLHF has emerged as a pivotal step in aligning language models with human objectives and values. It typically involves learning a reward model from human preference data and then using reinforcement learning to update the generative model accordingly
Externí odkaz:
http://arxiv.org/abs/2405.15065
Autor:
Balazadeh, Vahid, Chidambaram, Keertana, Nguyen, Viet, Krishnan, Rahul G., Syrgkanis, Vasilis
We study the problem of online sequential decision-making given auxiliary demonstrations from experts who made their decisions based on unobserved contextual information. These demonstrations can be viewed as solving related but slightly different ta
Externí odkaz:
http://arxiv.org/abs/2404.07266
Publikováno v:
Proceedings of the Americas Conference on Information Systems (AMCIS); 2018, p1-10, 10p