AdaDS: Adaptive data selection for accelerating pre-trained language model knowledge distillation

Autor:	Qinhong Zhou, Peng Li, Yang Liu, Yuyang Guan, Qizhou Xing, Ming Chen, Maosong Sun
Jazyk:	angličtina
Rok vydání:	2023
Předmět:	Knowledge distillation Pre-trained language model Active learning Electronic computers. Computer science QA75.5-76.95
Zdroj:	AI Open, Vol 4, Iss , Pp 56-63 (2023)
Druh dokumentu:	article
ISSN:	2666-6510
DOI:	10.1016/j.aiopen.2023.08.005
Popis:	Knowledge distillation (KD) is a widely used method for transferring knowledge from large teacher models to computationally efficient student models. Unfortunately, the computational cost of KD becomes unaffordable as pre-trained language models (PLMs) grow larger. Computing KD loss on only part of the training set is a promising way to accelerate KD. However, existing works heuristically leverage only one static data selection strategy during the KD process, demonstrating inconsistent improvements across different distillation scenarios. In this work, we conduct a thorough study on various typical data selection strategies for KD, and show that this problem is due to the fact that the best data selection strategy is specific to various factors, including task, selected data size, and training stage. To automatically adapt to these factors, we propose a framework named AdaDS to learn to choose the data selection strategy adaptively during the KD process. Experimental results show that our proposed method is effective for various tasks and selected data sizes under both fine-tuning and pre-training stages, achieving comparable performance to DistilBERT with only 10% amount of queries to the teacher model.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/8e80d3d76a954bf5977abd99aefa7e81 Zobrazit plný text záznamu View record in DOAJ