Zobrazeno 1 - 10
of 7 105
pro vyhledávání: '"Liu, Kang"'
Knowledge distillation typically employs the Kullback-Leibler (KL) divergence to constrain the student model's output to match the soft labels provided by the teacher model exactly. However, sometimes the optimization direction of the KL divergence l
Externí odkaz:
http://arxiv.org/abs/2409.17823
Autor:
Tang, Yuan-Han, Zhang, Xiaoran, Liu, Kang-Yuan, Xia, Fan, Zheng, Huijie, Liu, Xiaobing, Pan, Xin-Yu, Fan, Heng, Liu, Gang-Qin
As a point defect with unique spin and optical properties, nitrogen-vacancy (NV) center in diamond has attracted much attention in the fields of quantum sensing, quantum simulation, and quantum networks. The optical properties of an NV center are cru
Externí odkaz:
http://arxiv.org/abs/2409.17442
In this paper, we propose $\textbf{Ne}$ural-$\textbf{Sy}$mbolic $\textbf{C}$ollaborative $\textbf{D}$istillation ($\textbf{NesyCD}$), a novel knowledge distillation method for learning the complex reasoning abilities of Large Language Models (LLMs, e
Externí odkaz:
http://arxiv.org/abs/2409.13203
Tool learning enables the Large Language Models (LLMs) to interact with the external environment by invoking tools, enriching the accuracy and capability scope of LLMs. However, previous works predominantly focus on improving model's tool-utilizing a
Externí odkaz:
http://arxiv.org/abs/2409.13202
Small Language Models (SLMs) are attracting attention due to the high computational demands and privacy concerns of Large Language Models (LLMs). Some studies fine-tune SLMs using Chains of Thought (CoT) data distilled from LLMs, aiming to enhance th
Externí odkaz:
http://arxiv.org/abs/2409.13183
Large language models encapsulate knowledge and have demonstrated superior performance on various natural language processing tasks. Recent studies have localized this knowledge to specific model parameters, such as the MLP weights in intermediate la
Externí odkaz:
http://arxiv.org/abs/2409.00617
Pretrained language models like BERT and T5 serve as crucial backbone encoders for dense retrieval. However, these models often exhibit limited generalization capabilities and face challenges in improving in domain accuracy. Recent research has explo
Externí odkaz:
http://arxiv.org/abs/2408.12194
LLM have achieved success in many fields but still troubled by problematic content in the training corpora. LLM unlearning aims at reducing their influence and avoid undesirable behaviours. However, existing unlearning methods remain vulnerable to ad
Externí odkaz:
http://arxiv.org/abs/2408.10682
In the realm of event prediction, temporal knowledge graph forecasting (TKGF) stands as a pivotal technique. Previous approaches face the challenges of not utilizing experience during testing and relying on a single short-term history, which limits a
Externí odkaz:
http://arxiv.org/abs/2408.07840
Knowledge editing aims to update outdated or incorrect knowledge in large language models (LLMs). However, current knowledge editing methods have limited scalability for lifelong editing. This study explores the fundamental reason why knowledge editi
Externí odkaz:
http://arxiv.org/abs/2408.07413