Výsledky vyhledávání - "Chen, Jiuhai"

Report

Multi-Objective Linguistic Control of Large Language Models

Autor: Nguyen, Dang, Chen, Jiuhai, Zhou, Tianyi

Large language models (LLMs), despite their breakthroughs on many challenging benchmark tasks, lean to generate verbose responses and lack the controllability of output complexity, which is usually preferred by human users in practice. In this paper,

Externí odkaz: http://arxiv.org/abs/2406.16229

Zobrazit plný text záznamu

Report

GenQA: Generating Millions of Instructions from a Handful of Prompts

Autor: Chen, Jiuhai, Qadri, Rifaa, Wen, Yuxin, Jain, Neel, Kirchenbauer, John, Zhou, Tianyi, Goldstein, Tom

Most public instruction finetuning datasets are relatively small compared to the closed source datasets used to train industry models. To study questions about finetuning at scale, such as curricula and learning rate cooldown schedules, there is a ne

Externí odkaz: http://arxiv.org/abs/2406.10323

Zobrazit plný text záznamu

Report

OPTune: Efficient Online Preference Tuning

Autor: Chen, Lichang, Chen, Jiuhai, Liu, Chenxi, Kirchenbauer, John, Soselia, Davit, Zhu, Chen, Goldstein, Tom, Zhou, Tianyi, Huang, Heng

Reinforcement learning with human feedback~(RLHF) is critical for aligning Large Language Models (LLMs) with human preference. Compared to the widely studied offline version of RLHF, \emph{e.g.} direct preference optimization (DPO), recent works have

Externí odkaz: http://arxiv.org/abs/2406.07657

Zobrazit plný text záznamu

Report

Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement

Autor: Wang, Xiyao, Chen, Jiuhai, Wang, Zhaoyang, Zhou, Yuhang, Zhou, Yiyang, Yao, Huaxiu, Zhou, Tianyi, Goldstein, Tom, Bhatia, Parminder, Huang, Furong, Xiao, Cao

Large vision-language models (LVLMs) have achieved impressive results in various visual question-answering and reasoning tasks through vision instruction tuning on specific datasets. However, there is still significant room for improvement in the ali

Externí odkaz: http://arxiv.org/abs/2405.15973

Zobrazit plný text záznamu

Report

Automated Data Curation for Robust Language Model Fine-Tuning

Autor: Chen, Jiuhai, Mueller, Jonas

Large Language Models have become the de facto approach to sequence-to-sequence text generation tasks, but for specialized tasks/domains, a pretrained LLM lacks specific capabilities to produce accurate or well-formatted responses. Supervised fine-tu

Externí odkaz: http://arxiv.org/abs/2403.12776

Zobrazit plný text záznamu

Report

Can LLMs Speak For Diverse People? Tuning LLMs via Debate to Generate Controllable Controversial Statements

Autor: Li, Ming, Chen, Jiuhai, Chen, Lichang, Zhou, Tianyi

Making LLMs speak for different, especially minority groups of people, and generate statements supporting their diverse or even controversial perspectives is critical to creating an inclusive environment. However, existing LLMs lack sufficient contro

Externí odkaz: http://arxiv.org/abs/2402.10614

Zobrazit plný text záznamu

Report

Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning

Autor: Li, Ming, Chen, Lichang, Chen, Jiuhai, He, Shwai, Gu, Jiuxiang, Zhou, Tianyi

Instruction tuning is critical to large language models (LLMs) for achieving better instruction following and task adaptation capabilities but its success heavily relies on the training data quality. Many recent methods focus on improving the data qu

Externí odkaz: http://arxiv.org/abs/2402.10110

Zobrazit plný text záznamu

Report

ODIN: Disentangled Reward Mitigates Hacking in RLHF

Autor: Chen, Lichang, Zhu, Chen, Soselia, Davit, Chen, Jiuhai, Zhou, Tianyi, Goldstein, Tom, Huang, Heng, Shoeybi, Mohammad, Catanzaro, Bryan

In this work, we study the issue of reward hacking on the response length, a challenge emerging in Reinforcement Learning from Human Feedback (RLHF) on LLMs. A well-formatted, verbose but less helpful response from the LLMs can often deceive LLMs or

Externí odkaz: http://arxiv.org/abs/2402.07319

Zobrazit plný text záznamu

Report

Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning

Autor: Li, Ming, Chen, Lichang, Chen, Jiuhai, He, Shwai, Huang, Heng, Gu, Jiuxiang, Zhou, Tianyi

Recent advancements in Large Language Models (LLMs) have expanded the horizons of natural language understanding and generation. Notably, the output control and alignment with the input of LLMs can be refined through instruction tuning. However, as h

Externí odkaz: http://arxiv.org/abs/2310.11716

Zobrazit plný text záznamu

Report

Quantifying Uncertainty in Answers from any Language Model and Enhancing their Trustworthiness

Autor: Chen, Jiuhai, Mueller, Jonas

We introduce BSDetector, a method for detecting bad and speculative answers from a pretrained Large Language Model by estimating a numeric confidence score for any output it generated. Our uncertainty quantification technique works for any LLM access

Externí odkaz: http://arxiv.org/abs/2308.16175

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání