Zobrazeno 1 - 10
of 31
pro vyhledávání: '"Chen, Jiuhai"'
Large language models (LLMs), despite their breakthroughs on many challenging benchmark tasks, lean to generate verbose responses and lack the controllability of output complexity, which is usually preferred by human users in practice. In this paper,
Externí odkaz:
http://arxiv.org/abs/2406.16229
Autor:
Chen, Jiuhai, Qadri, Rifaa, Wen, Yuxin, Jain, Neel, Kirchenbauer, John, Zhou, Tianyi, Goldstein, Tom
Most public instruction finetuning datasets are relatively small compared to the closed source datasets used to train industry models. To study questions about finetuning at scale, such as curricula and learning rate cooldown schedules, there is a ne
Externí odkaz:
http://arxiv.org/abs/2406.10323
Autor:
Chen, Lichang, Chen, Jiuhai, Liu, Chenxi, Kirchenbauer, John, Soselia, Davit, Zhu, Chen, Goldstein, Tom, Zhou, Tianyi, Huang, Heng
Reinforcement learning with human feedback~(RLHF) is critical for aligning Large Language Models (LLMs) with human preference. Compared to the widely studied offline version of RLHF, \emph{e.g.} direct preference optimization (DPO), recent works have
Externí odkaz:
http://arxiv.org/abs/2406.07657
Autor:
Wang, Xiyao, Chen, Jiuhai, Wang, Zhaoyang, Zhou, Yuhang, Zhou, Yiyang, Yao, Huaxiu, Zhou, Tianyi, Goldstein, Tom, Bhatia, Parminder, Huang, Furong, Xiao, Cao
Large vision-language models (LVLMs) have achieved impressive results in various visual question-answering and reasoning tasks through vision instruction tuning on specific datasets. However, there is still significant room for improvement in the ali
Externí odkaz:
http://arxiv.org/abs/2405.15973
Autor:
Chen, Jiuhai, Mueller, Jonas
Large Language Models have become the de facto approach to sequence-to-sequence text generation tasks, but for specialized tasks/domains, a pretrained LLM lacks specific capabilities to produce accurate or well-formatted responses. Supervised fine-tu
Externí odkaz:
http://arxiv.org/abs/2403.12776
Making LLMs speak for different, especially minority groups of people, and generate statements supporting their diverse or even controversial perspectives is critical to creating an inclusive environment. However, existing LLMs lack sufficient contro
Externí odkaz:
http://arxiv.org/abs/2402.10614
Instruction tuning is critical to large language models (LLMs) for achieving better instruction following and task adaptation capabilities but its success heavily relies on the training data quality. Many recent methods focus on improving the data qu
Externí odkaz:
http://arxiv.org/abs/2402.10110
Autor:
Chen, Lichang, Zhu, Chen, Soselia, Davit, Chen, Jiuhai, Zhou, Tianyi, Goldstein, Tom, Huang, Heng, Shoeybi, Mohammad, Catanzaro, Bryan
In this work, we study the issue of reward hacking on the response length, a challenge emerging in Reinforcement Learning from Human Feedback (RLHF) on LLMs. A well-formatted, verbose but less helpful response from the LLMs can often deceive LLMs or
Externí odkaz:
http://arxiv.org/abs/2402.07319
Recent advancements in Large Language Models (LLMs) have expanded the horizons of natural language understanding and generation. Notably, the output control and alignment with the input of LLMs can be refined through instruction tuning. However, as h
Externí odkaz:
http://arxiv.org/abs/2310.11716
Autor:
Chen, Jiuhai, Mueller, Jonas
We introduce BSDetector, a method for detecting bad and speculative answers from a pretrained Large Language Model by estimating a numeric confidence score for any output it generated. Our uncertainty quantification technique works for any LLM access
Externí odkaz:
http://arxiv.org/abs/2308.16175