Lightweight Compression of Intermediate Neural Network Features for Collaborative Intelligence

Autor:	Cohen, Robert A., Choi, Hyomin, Bajić, Ivan V.
Rok vydání:	2021
Předmět:	Computer Science - Machine Learning Electrical Engineering and Systems Science - Image and Video Processing
Zdroj:	IEEE Open Journal of Circuits and Systems, vol. 2, 13 May 2021, pp. 350-362
Druh dokumentu:	Working Paper
DOI:	10.1109/OJCAS.2021.3072884
Popis:	In collaborative intelligence applications, part of a deep neural network (DNN) is deployed on a lightweight device such as a mobile phone or edge device, and the remaining portion of the DNN is processed where more computing resources are available, such as in the cloud. This paper presents a novel lightweight compression technique designed specifically to quantize and compress the features output by the intermediate layer of a split DNN, without requiring any retraining of the network weights. Mathematical models for estimating the clipping and quantization error of ReLU and leaky-ReLU activations at this intermediate layer are developed and used to compute optimal clipping ranges for coarse quantization. We also present a modified entropy-constrained design algorithm for quantizing clipped activations. When applied to popular object-detection and classification DNNs, we were able to compress the 32-bit floating point intermediate activations down to 0.6 to 0.8 bits, while keeping the loss in accuracy to less than 1%. When compared to HEVC, we found that the lightweight codec consistently provided better inference accuracy, by up to 1.3%. The performance and simplicity of this lightweight compression technique makes it an attractive option for coding an intermediate layer of a split neural network for edge/cloud applications. Comment: Accepted for publication in IEEE Open Journal of Circuits and Systems
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2105.07102 Zobrazit plný text záznamu View this record from Arxiv