Variational Bayesian Group-Level Sparsification for Knowledge Distillation

Autor:	Yue Ming, Hao Fu, Yibo Jiang, Hui Yu
Jazyk:	angličtina
Rok vydání:	2020
Předmět:	Knowledge distillation group sparsity sparsity-inducing prior variational Bayesian approximation Electrical engineering. Electronics. Nuclear engineering TK1-9971
Zdroj:	IEEE Access, Vol 8, Pp 126628-126636 (2020)
Druh dokumentu:	article
ISSN:	2169-3536
DOI:	10.1109/ACCESS.2020.3008854
Popis:	Deep neural networks are capable of learning powerful representation, but often limited by heavy network architectures and high computational cost. Knowledge distillation (KD) is one of the effective ways to perform model compression and inference acceleration. But the final student models remain parameter redundancy. To tackle these issues, we propose a novel approach, called Variational Bayesian Group-level Sparsification for Knowledge Distillation (VBGS-KD), to distill a large teacher network into a small and sparse student network while preserving accuracy. We impose the sparsity-inducing prior on the groups of parameters in the student model, and introduce the variational Bayesian approximation to learn structural sparseness, which can effectively prune most part of weights. The prune threshold is learned during training without extra fine-tuning. The proposed method can learn the robust student networks that have achieved satisifying accuracy and compact sizes compared with the state-of-the-arts methods. We have validated our method on the MNIST and CIFAR-10 datasets, observing 90.3% sparsity with 0.19% accuracy boosting in MNIST. Extensive experiments on the CIFAR-10 dataset demonstrate the efficiency of the proposed approach.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/d8d7ee4e60b144b7b9f9d6f5edefc2d8 Zobrazit plný text záznamu View record in DOAJ