TRELM: Towards Robust and Efficient Pre-training for Knowledge-Enhanced Language Models

Autor:	Yan, Junbing, Wang, Chengyu, Zhang, Taolin, He, Xiaofeng, Huang, Jun, Huang, Longtao, Xue, Hui, Zhang, Wei
Rok vydání:	2024
Předmět:	Computer Science - Computation and Language
Druh dokumentu:	Working Paper
Popis:	KEPLMs are pre-trained models that utilize external knowledge to enhance language understanding. Previous language models facilitated knowledge acquisition by incorporating knowledge-related pre-training tasks learned from relation triples in knowledge graphs. However, these models do not prioritize learning embeddings for entity-related tokens. Moreover, updating the entire set of parameters in KEPLMs is computationally demanding. This paper introduces TRELM, a Robust and Efficient Pre-training framework for Knowledge-Enhanced Language Models. We observe that entities in text corpora usually follow the long-tail distribution, where the representations of some entities are suboptimally optimized and hinder the pre-training process for KEPLMs. To tackle this, we employ a robust approach to inject knowledge triples and employ a knowledge-augmented memory bank to capture valuable information. Furthermore, updating a small subset of neurons in the feed-forward networks (FFNs) that store factual knowledge is both sufficient and efficient. Specifically, we utilize dynamic knowledge routing to identify knowledge paths in FFNs and selectively update parameters during pre-training. Experimental results show that TRELM reduces pre-training time by at least 50% and outperforms other KEPLMs in knowledge probing tasks and multiple knowledge-aware language understanding tasks.
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2403.11203 Zobrazit plný text záznamu View this record from Arxiv