Zobrazeno 1 - 1
of 1
pro vyhledávání: '"Chen, Gengru"'
Autor:
Li, Yanshi, Xiong, Shaopan, Chen, Gengru, Li, Xiaoyang, Luo, Yijia, Zhang, Xingyao, Huang, Yanhui, Bu, Xingyuan, Tan, Yingshui, Yuan, Chun, Wang, Jiamang, Su, Wenbo, Zheng, Bo
Reinforcement Learning from Human Feedback (RLHF) has proven highly effective in aligning Large Language Models (LLMs) with human preferences. However, the original RLHF typically optimizes under an overall reward, which can lead to a suboptimal lear
Externí odkaz:
http://arxiv.org/abs/2411.00809