Zobrazeno 1 - 10
of 151
pro vyhledávání: '"Chen, Lichang"'
Autor:
Liu, Tianqi, Xiong, Wei, Ren, Jie, Chen, Lichang, Wu, Junru, Joshi, Rishabh, Gao, Yang, Shen, Jiaming, Qin, Zhen, Yu, Tianhe, Sohn, Daniel, Makarova, Anastasiia, Liu, Jeremiah, Liu, Yuan, Piot, Bilal, Ittycheriah, Abe, Kumar, Aviral, Saleh, Mohammad
Reward models (RMs) play a pivotal role in aligning large language models (LLMs) with human preferences. However, traditional RM training, which relies on response pairs tied to specific prompts, struggles to disentangle prompt-driven preferences fro
Externí odkaz:
http://arxiv.org/abs/2409.13156
In this paper, we study format biases in reinforcement learning from human feedback (RLHF). We observe that many widely-used preference models, including human evaluators, GPT-4, and top-ranking models on the RewardBench benchmark, exhibit strong bia
Externí odkaz:
http://arxiv.org/abs/2409.11704
Autor:
Chen, Lichang, Chen, Jiuhai, Liu, Chenxi, Kirchenbauer, John, Soselia, Davit, Zhu, Chen, Goldstein, Tom, Zhou, Tianyi, Huang, Heng
Reinforcement learning with human feedback~(RLHF) is critical for aligning Large Language Models (LLMs) with human preference. Compared to the widely studied offline version of RLHF, \emph{e.g.} direct preference optimization (DPO), recent works have
Externí odkaz:
http://arxiv.org/abs/2406.07657
Autor:
Zhang, Xinlu, Chen, Zhiyu Zoey, Ye, Xi, Yang, Xianjun, Chen, Lichang, Wang, William Yang, Petzold, Linda Ruth
Instruction Fine-Tuning (IFT) significantly enhances the zero-shot capabilities of pretrained Large Language Models (LLMs). While coding data is known to boost reasoning abilities during LLM pretraining, its role in activating internal reasoning capa
Externí odkaz:
http://arxiv.org/abs/2405.20535
Existing 3D mesh shape evaluation metrics mainly focus on the overall shape but are usually less sensitive to local details. This makes them inconsistent with human evaluation, as human perception cares about both overall and detailed shape. In this
Externí odkaz:
http://arxiv.org/abs/2403.01619
Autor:
Chen, Ruibo, Wu, Yihan, Chen, Lichang, Liu, Guodong, He, Qi, Xiong, Tianyi, Liu, Chenxi, Guo, Junfeng, Huang, Heng
Data selection in instruction tuning emerges as a pivotal process for acquiring high-quality data and training instruction-following large language models (LLMs), but it is still a new and unexplored research area for vision-language models (VLMs). E
Externí odkaz:
http://arxiv.org/abs/2402.12501
Making LLMs speak for different, especially minority groups of people, and generate statements supporting their diverse or even controversial perspectives is critical to creating an inclusive environment. However, existing LLMs lack sufficient contro
Externí odkaz:
http://arxiv.org/abs/2402.10614
Instruction tuning is critical to large language models (LLMs) for achieving better instruction following and task adaptation capabilities but its success heavily relies on the training data quality. Many recent methods focus on improving the data qu
Externí odkaz:
http://arxiv.org/abs/2402.10110
Autor:
Chen, Lichang, Zhu, Chen, Soselia, Davit, Chen, Jiuhai, Zhou, Tianyi, Goldstein, Tom, Huang, Heng, Shoeybi, Mohammad, Catanzaro, Bryan
In this work, we study the issue of reward hacking on the response length, a challenge emerging in Reinforcement Learning from Human Feedback (RLHF) on LLMs. A well-formatted, verbose but less helpful response from the LLMs can often deceive LLMs or
Externí odkaz:
http://arxiv.org/abs/2402.07319
Autor:
Chen, Ruibo, Xiong, Tianyi, Wu, Yihan, Liu, Guodong, Hu, Zhengmian, Chen, Lichang, Chen, Yanshuo, Liu, Chenxi, Huang, Heng
This technical report delves into the application of GPT-4 Vision (GPT-4V) in the nuanced realm of COVID-19 image classification, leveraging the transformative potential of in-context learning to enhance diagnostic processes.
Externí odkaz:
http://arxiv.org/abs/2310.18498