Channel-wise Mixed-precision Assignment for DNN Inference on Constrained Edge Nodes

Autor:	Risso, M., Burrello, A., Benini, L., Macii, E., Poncino, M., Jahier Pagliari, D.
Jazyk:	angličtina
Rok vydání:	2022
Předmět:	FOS: Computer and information sciences TinyML Computer Science - Machine Learning Deep Learning NAS Quantization Machine Learning (cs.LG)
Zdroj:	Proceedings of the 2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC) 2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)
Popis:	Quantization is widely employed in both cloud and edge systems to reduce the memory occupation, latency, and energy consumption of deep neural networks. In particular, mixed-precision quantization, i.e., the use of different bit-widths for different portions of the network, has been shown to provide excellent efficiency gains with limited accuracy drops, especially with optimized bit-width assignments determined by automated Neural Architecture Search (NAS) tools. State-of-the-art mixed-precision works layer-wise, i.e., it uses different bit-widths for the weights and activations tensors of each network layer. In this work, we widen the search space, proposing a novel NAS that selects the bit-width of each weight tensor channel independently. This gives the tool the additional flexibility of assigning a higher precision only to the weights associated with the most informative features. Testing on the MLPerf Tiny benchmark suite, we obtain a rich collection of Pareto-optimal models in the accuracy vs model size and accuracy vs energy spaces. When deployed on the MPIC RISC-V edge processor, our networks reduce the memory and energy for inference by up to 63% and 27% respectively compared to a layer-wise approach, for the same accuracy.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::c28c6e89164fdaddc76afd4f89e0a3b0 http://arxiv.org/abs/2206.08852 Zobrazit plný text záznamu