Zobrazeno 1 - 10
of 32
pro vyhledávání: '"Yuan, Lifan"'
Autor:
He, Bingxiang, Ding, Ning, Qian, Cheng, Deng, Jia, Cui, Ganqu, Yuan, Lifan, Gao, Huan-ang, Chen, Huimin, Liu, Zhiyuan, Sun, Maosong
Understanding alignment techniques begins with comprehending zero-shot generalization brought by instruction tuning, but little of the mechanism has been understood. Existing work has largely been confined to the task level, without considering that
Externí odkaz:
http://arxiv.org/abs/2406.11721
Autor:
Yuan, Lifan, Cui, Ganqu, Wang, Hanbin, Ding, Ning, Wang, Xingyao, Deng, Jia, Shan, Boji, Chen, Huimin, Xie, Ruobing, Lin, Yankai, Liu, Zhenghao, Zhou, Bowen, Peng, Hao, Liu, Zhiyuan, Sun, Maosong
We introduce Eurus, a suite of large language models (LLMs) optimized for reasoning. Finetuned from Mistral-7B and CodeLlama-70B, Eurus models achieve state-of-the-art results among open-source models on a diverse set of benchmarks covering mathemati
Externí odkaz:
http://arxiv.org/abs/2404.02078
Autor:
Guo, Yiju, Cui, Ganqu, Yuan, Lifan, Ding, Ning, Sun, Zexu, Sun, Bowen, Chen, Huimin, Xie, Ruobing, Zhou, Jie, Lin, Yankai, Liu, Zhiyuan, Sun, Maosong
Alignment in artificial intelligence pursues the consistency between model responses and human preferences as well as values. In practice, the multifaceted nature of human preferences inadvertently introduces what is known as the "alignment tax" -a c
Externí odkaz:
http://arxiv.org/abs/2402.19085
User intentions are typically formalized as evaluation rewards to be maximized when fine-tuning language models (LMs). Existing alignment methods, such as Direct Preference Optimization (DPO), are mainly tailored for pairwise preference data where re
Externí odkaz:
http://arxiv.org/abs/2402.05369
Large Language Model (LLM) agents, capable of performing a broad range of actions, such as invoking tools and controlling robots, show great potential in tackling real-world challenges. LLM agents are typically prompted to produce actions by generati
Externí odkaz:
http://arxiv.org/abs/2402.01030
Can large language models (LLMs) express their uncertainty in situations where they lack sufficient parametric knowledge to generate reasonable responses? This work aims to systematically investigate LLMs' behaviors in such situations, emphasizing th
Externí odkaz:
http://arxiv.org/abs/2311.09731
Autor:
Cui, Ganqu, Yuan, Lifan, Ding, Ning, Yao, Guanming, He, Bingxiang, Zhu, Wei, Ni, Yuan, Xie, Guotong, Xie, Ruobing, Lin, Yankai, Liu, Zhiyuan, Sun, Maosong
Learning from human feedback has become a pivot technique in aligning large language models (LLMs) with human preferences. However, acquiring vast and premium human feedback is bottlenecked by time, labor, and human capability, resulting in small siz
Externí odkaz:
http://arxiv.org/abs/2310.01377
Large language models (LLMs) are often augmented with tools to solve complex tasks. By generating code snippets and executing them through task-specific Application Programming Interfaces (APIs), they can offload certain functions to dedicated extern
Externí odkaz:
http://arxiv.org/abs/2309.17428
To solve complex tasks, large language models (LLMs) often require multiple rounds of interactions with the user, sometimes assisted by external tools. However, current evaluation protocols often emphasize benchmark performance with single-turn excha
Externí odkaz:
http://arxiv.org/abs/2309.10691
Autor:
Yuan, Lifan, Chen, Yangyi, Cui, Ganqu, Gao, Hongcheng, Zou, Fangyuan, Cheng, Xingyi, Ji, Heng, Liu, Zhiyuan, Sun, Maosong
This paper reexamines the research on out-of-distribution (OOD) robustness in the field of NLP. We find that the distribution shift settings in previous studies commonly lack adequate challenges, hindering the accurate evaluation of OOD robustness. T
Externí odkaz:
http://arxiv.org/abs/2306.04618