Výsledky vyhledávání

Report

Zero-Shot Generalization during Instruction Tuning: Insights from Similarity and Granularity

Autor: He, Bingxiang, Ding, Ning, Qian, Cheng, Deng, Jia, Cui, Ganqu, Yuan, Lifan, Gao, Huan-ang, Chen, Huimin, Liu, Zhiyuan, Sun, Maosong

Understanding alignment techniques begins with comprehending zero-shot generalization brought by instruction tuning, but little of the mechanism has been understood. Existing work has largely been confined to the task level, without considering that

Externí odkaz: http://arxiv.org/abs/2406.11721

Zobrazit plný text záznamu

Report

Advancing LLM Reasoning Generalists with Preference Trees

Autor: Yuan, Lifan, Cui, Ganqu, Wang, Hanbin, Ding, Ning, Wang, Xingyao, Deng, Jia, Shan, Boji, Chen, Huimin, Xie, Ruobing, Lin, Yankai, Liu, Zhenghao, Zhou, Bowen, Peng, Hao, Liu, Zhiyuan, Sun, Maosong

We introduce Eurus, a suite of large language models (LLMs) optimized for reasoning. Finetuned from Mistral-7B and CodeLlama-70B, Eurus models achieve state-of-the-art results among open-source models on a diverse set of benchmarks covering mathemati

Externí odkaz: http://arxiv.org/abs/2404.02078

Zobrazit plný text záznamu

Report

Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment

Autor: Guo, Yiju, Cui, Ganqu, Yuan, Lifan, Ding, Ning, Sun, Zexu, Sun, Bowen, Chen, Huimin, Xie, Ruobing, Zhou, Jie, Lin, Yankai, Liu, Zhiyuan, Sun, Maosong

Alignment in artificial intelligence pursues the consistency between model responses and human preferences as well as values. In practice, the multifaceted nature of human preferences inadvertently introduces what is known as the "alignment tax" -a c

Externí odkaz: http://arxiv.org/abs/2402.19085

Zobrazit plný text záznamu

Report

Noise Contrastive Alignment of Language Models with Explicit Rewards

Autor: Chen, Huayu, He, Guande, Yuan, Lifan, Cui, Ganqu, Su, Hang, Zhu, Jun

User intentions are typically formalized as evaluation rewards to be maximized when fine-tuning language models (LMs). Existing alignment methods, such as Direct Preference Optimization (DPO), are mainly tailored for pairwise preference data where re

Externí odkaz: http://arxiv.org/abs/2402.05369

Zobrazit plný text záznamu

Report

Executable Code Actions Elicit Better LLM Agents

Autor: Wang, Xingyao, Chen, Yangyi, Yuan, Lifan, Zhang, Yizhe, Li, Yunzhu, Peng, Hao, Ji, Heng

Large Language Model (LLM) agents, capable of performing a broad range of actions, such as invoking tools and controlling robots, show great potential in tackling real-world challenges. LLM agents are typically prompted to produce actions by generati

Externí odkaz: http://arxiv.org/abs/2402.01030

Zobrazit plný text záznamu

Report

Examining LLMs' Uncertainty Expression Towards Questions Outside Parametric Knowledge

Autor: Liu, Genglin, Wang, Xingyao, Yuan, Lifan, Chen, Yangyi, Peng, Hao

Can large language models (LLMs) express their uncertainty in situations where they lack sufficient parametric knowledge to generate reasonable responses? This work aims to systematically investigate LLMs' behaviors in such situations, emphasizing th

Externí odkaz: http://arxiv.org/abs/2311.09731

Zobrazit plný text záznamu

Report

UltraFeedback: Boosting Language Models with Scaled AI Feedback

Autor: Cui, Ganqu, Yuan, Lifan, Ding, Ning, Yao, Guanming, He, Bingxiang, Zhu, Wei, Ni, Yuan, Xie, Guotong, Xie, Ruobing, Lin, Yankai, Liu, Zhiyuan, Sun, Maosong

Learning from human feedback has become a pivot technique in aligning large language models (LLMs) with human preferences. However, acquiring vast and premium human feedback is bottlenecked by time, labor, and human capability, resulting in small siz

Externí odkaz: http://arxiv.org/abs/2310.01377

Zobrazit plný text záznamu

Report

CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets

Autor: Yuan, Lifan, Chen, Yangyi, Wang, Xingyao, Fung, Yi R., Peng, Hao, Ji, Heng

Large language models (LLMs) are often augmented with tools to solve complex tasks. By generating code snippets and executing them through task-specific Application Programming Interfaces (APIs), they can offload certain functions to dedicated extern

Externí odkaz: http://arxiv.org/abs/2309.17428

Zobrazit plný text záznamu

Report

MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback

Autor: Wang, Xingyao, Wang, Zihan, Liu, Jiateng, Chen, Yangyi, Yuan, Lifan, Peng, Hao, Ji, Heng

To solve complex tasks, large language models (LLMs) often require multiple rounds of interactions with the user, sometimes assisted by external tools. However, current evaluation protocols often emphasize benchmark performance with single-turn excha

Externí odkaz: http://arxiv.org/abs/2309.10691

Zobrazit plný text záznamu

Report

Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis, and LLMs Evaluations

Autor: Yuan, Lifan, Chen, Yangyi, Cui, Ganqu, Gao, Hongcheng, Zou, Fangyuan, Cheng, Xingyi, Ji, Heng, Liu, Zhiyuan, Sun, Maosong

This paper reexamines the research on out-of-distribution (OOD) robustness in the field of NLP. We find that the distribution shift settings in previous studies commonly lack adequate challenges, hindering the accurate evaluation of OOD robustness. T

Externí odkaz: http://arxiv.org/abs/2306.04618

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání