Effective Online Knowledge Distillation via Attention-Based Model Ensembling

Autor:	Diana-Laura Borza, Adrian Sergiu Darabant, Tudor Alexandru Ileni, Alexandru-Ion Marinescu
Jazyk:	angličtina
Rok vydání:	2022
Předmět:	online knowledge distillation ensemble learning attention aggregation deep learning Mathematics QA1-939
Zdroj:	Mathematics, Vol 10, Iss 22, p 4285 (2022)
Druh dokumentu:	article
ISSN:	2227-7390
DOI:	10.3390/math10224285
Popis:	Large-scale deep learning models have achieved impressive results on a variety of tasks; however, their deployment on edge or mobile devices is still a challenge due to the limited available memory and computational capability. Knowledge distillation is an effective model compression technique, which can boost the performance of a lightweight student network by transferring the knowledge from a more complex model or an ensemble of models. Due to its reduced size, this lightweight model is more suitable for deployment on edge devices. In this paper, we introduce an online knowledge distillation framework, which relies on an original attention mechanism to effectively combine the predictions of a cohort of lightweight (student) networks into a powerful ensemble, and use this as a distillation signal. The proposed aggregation strategy uses the predictions of the individual students as well as ground truth data to determine a set of weights needed for ensembling these predictions. This mechanism is solely used during system training. When testing or at inference time, a single, lightweight student is extracted and used. The extensive experiments we performed on several image classification benchmarks, both by training models from scratch (on CIFAR-10, CIFAR-100, and Tiny ImageNet datasets) and using transfer learning (on Oxford Pets and Oxford Flowers datasets), showed that the proposed framework always leads to an improvement in the accuracy of knowledge-distilled students and demonstrates the effectiveness of the proposed solution. Moreover, in the case of ResNet architecture, we observed that the knowledge-distilled model achieves a higher accuracy than a deeper, individually trained ResNet model.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/5a308cbcac93469790df7e2d5f125d17 Zobrazit plný text záznamu View record in DOAJ Plný text ve formátu PDF Plný text ve formátu HTML
Nepřihlášeným uživatelům se plný text nezobrazuje	K zobrazení výsledku je třeba se přihlásit.