Structured Compression by Weight Encryption for Unstructured Pruning and Quantization
Autor: | Byeongwook Kim, Parichay Kapoor, Baeseong Park, Se Jung Kwon, Gu-Yeon Wei, Dongsoo Lee |
---|---|
Rok vydání: | 2020 |
Předmět: |
FOS: Computer and information sciences
Computer Science - Machine Learning Computer science Machine Learning (stat.ML) 010501 environmental sciences Encryption Viterbi algorithm 01 natural sciences Machine Learning (cs.LG) symbols.namesake Statistics - Machine Learning 0103 physical sciences Overhead (computing) Pruning (decision trees) Quantization (image processing) 0105 earth and related environmental sciences Sparse matrix 010302 applied physics business.industry Quantization (signal processing) Deep learning Memory bandwidth symbols Artificial intelligence business Algorithm Decoding methods |
Zdroj: | CVPR |
DOI: | 10.1109/cvpr42600.2020.00198 |
Popis: | Model compression techniques, such as pruning and quantization, are becoming increasingly important to reduce the memory footprints and the amount of computations. Despite model size reduction, achieving performance enhancement on devices is, however, still challenging mainly due to the irregular representations of sparse matrix formats. This paper proposes a new weight representation scheme for Sparse Quantized Neural Networks, specifically achieved by fine-grained and unstructured pruning method. The representation is encrypted in a structured regular format, which can be efficiently decoded through XOR-gate network during inference in a parallel manner. We demonstrate various deep learning models that can be compressed and represented by our proposed format with fixed and high compression ratio. For example, for fully-connected layers of AlexNet on ImageNet dataset, we can represent the sparse weights by only 0.28 bits/weight for 1-bit quantization and 91% pruning rate with a fixed decoding rate and full memory bandwidth usage. Decoding through XOR-gate network can be performed without any model accuracy degradation with additional patch data associated with small overhead. |
Databáze: | OpenAIRE |
Externí odkaz: |