Zobrazeno 1 - 10
of 307
pro vyhledávání: '"QUAN Xiaojun"'
While fusing heterogeneous open-source LLMs with varying architectures and sizes can potentially integrate the strengths of different models, existing fusion methods face significant challenges, such as vocabulary alignment and merging distribution m
Externí odkaz:
http://arxiv.org/abs/2412.03187
While training large language models (LLMs) from scratch can indeed lead to models with distinct capabilities and strengths, it incurs substantial costs and may lead to redundancy in competencies. Knowledge fusion aims to integrate existing LLMs of d
Externí odkaz:
http://arxiv.org/abs/2408.07990
While fusing the capacities and advantages of various large language models (LLMs) offers a pathway to construct more powerful and versatile models, a fundamental challenge is to properly select advantageous model during the training. Existing fusion
Externí odkaz:
http://arxiv.org/abs/2408.04998
We focus on the problem of fusing two or more heterogeneous large language models (LLMs) to facilitate their complementary strengths. One of the challenges on model fusion is high computational load, i.e. to fine-tune or to align vocabularies via com
Externí odkaz:
http://arxiv.org/abs/2407.19807
The alignment of large language models (LLMs) is crucial not only for unlocking their potential in specific tasks but also for ensuring that responses meet human expectations and adhere to safety and ethical principles. Current alignment methodologie
Externí odkaz:
http://arxiv.org/abs/2406.10813
With the rapid growth in the size and complexity of large language models (LLMs), the costs associated with their training and inference have escalated significantly. Research indicates that certain layers in LLMs harbor substantial redundancy, and p
Externí odkaz:
http://arxiv.org/abs/2406.10594
Autor:
Chen, Hongzhan, Chen, Hehong, Yan, Ming, Xu, Wenshen, Gao, Xing, Shen, Weizhou, Quan, Xiaojun, Li, Chenliang, Zhang, Ji, Huang, Fei, Zhou, Jingren
Large language models (LLMs) have advanced the development of various AI conversational agents, including role-playing conversational agents that mimic diverse characters and human behaviors. While prior research has predominantly focused on enhancin
Externí odkaz:
http://arxiv.org/abs/2403.13679
Recently, FuseLLM introduced the concept of knowledge fusion to transfer the collective knowledge of multiple structurally varied LLMs into a target LLM through lightweight continual training. In this report, we extend the scalability and flexibility
Externí odkaz:
http://arxiv.org/abs/2402.16107
Autor:
Yang, Haihui, Quan, Xiaojun
Chinese grammatical error correction (CGEC) faces serious overcorrection challenges when employing autoregressive generative models such as sequence-to-sequence (Seq2Seq) models and decoder-only large language models (LLMs). While previous methods ai
Externí odkaz:
http://arxiv.org/abs/2402.04601
While large language models (LLMs) have demonstrated exceptional performance across various tasks following human alignment, they may still generate responses that sound plausible but contradict factual knowledge, a phenomenon known as hallucination.
Externí odkaz:
http://arxiv.org/abs/2401.10768