Structured Dynamic Precision for Deep Neural Networks Quantization
Autor: | Kai Huang, Bowen Li, Dongliang Xiong, Haitian Jiang, Xiaowen Jiang, Xiaolang Yan, Luc Claesen, Dehong Liu, Junjian Chen, Zhili Liu |
---|---|
Rok vydání: | 2023 |
Předmět: | |
Zdroj: | ACM Transactions on Design Automation of Electronic Systems. 28:1-24 |
ISSN: | 1557-7309 1084-4309 |
Popis: | Deep Neural Networks (DNNs) have achieved remarkable success in various Artificial Intelligence applications. Quantization is a critical step in DNNs compression and acceleration for deployment. To further boost DNN execution efficiency, many works explore to leverage the input-dependent redundancy with dynamic quantization for different regions. However, the sensitive regions in the feature map are irregularly distributed, which restricts the real speed up for existing accelerators. To this end, we propose an algorithm-architecture co-design, named Structured Dynamic Precision (SDP). Specifically, we propose a quantization scheme in which the high-order bit part and the low-order bit part of data can be masked independently. And a fixed number of term parts are dynamically selected for computation based on the importance of each term in the group. We also present a hardware design to enable the algorithm efficiently with small overheads, whose inference time mainly scales with the precision proportionally. Evaluation experiments on extensive networks demonstrate that compared to the state-of-the-art dynamic quantization accelerator DRQ, our SDP can achieve 29% performance gain and 51% energy reduction for the same level of model accuracy. |
Databáze: | OpenAIRE |
Externí odkaz: |