A Real-Time Architecture for Pruning the Effectual Computations in Deep Neural Networks
Autor: | Hyuk-Jae Lee, Lakshminarayanan Gopalakrishnan, Mohammadreza Asadikouhanjani, Hao Zhang, Seok-Bum Ko |
---|---|
Rok vydání: | 2021 |
Předmět: |
Speedup
Artificial neural network Computer science Dataflow Reference design 020208 electrical & electronic engineering Sorting 02 engineering and technology Parallel computing Filter (video) 0202 electrical engineering electronic engineering information engineering Benchmark (computing) Pruning (decision trees) Electrical and Electronic Engineering |
Zdroj: | IEEE Transactions on Circuits and Systems I: Regular Papers. 68:2030-2041 |
ISSN: | 1558-0806 1549-8328 |
DOI: | 10.1109/tcsi.2021.3060945 |
Popis: | Integrating Deep Neural Networks (DNNs) into the Internet of Thing (IoT) devices could result in the emergence of complex sensing and recognition tasks that support a new era of human interactions with surrounding environments. However, DNNs are power-hungry, performing billions of computations in terms of one inference. Spatial DNN accelerators in principle can support computation-pruning techniques compared to other common architectures such as systolic arrays. Energy-efficient DNN accelerators skip bit-wise or word-wise sparsity in the input feature maps (ifmaps) and filter weights which means ineffectual computations are skipped. However, there is still room for pruning the effectual computations without reducing the accuracy of DNNs. In this paper, we propose a novel real-time architecture and dataflow by decomposing multiplications down to the bit level and pruning identical computations in spatial designs while running benchmark networks. The proposed architecture prunes identical computations by identifying identical bit values available in both ifmaps and filter weights without changing the accuracy of benchmark networks. When compared to the reference design, our proposed design achieves an average per layer speedup of $\times 1.4$ and an energy efficiency of $\times 1.21$ per inference while maintaining the accuracy of benchmark networks. |
Databáze: | OpenAIRE |
Externí odkaz: |