A Real-Time Architecture for Pruning the Effectual Computations in Deep Neural Networks

Autor: Hyuk-Jae Lee, Lakshminarayanan Gopalakrishnan, Mohammadreza Asadikouhanjani, Hao Zhang, Seok-Bum Ko
Rok vydání: 2021
Předmět:
Zdroj: IEEE Transactions on Circuits and Systems I: Regular Papers. 68:2030-2041
ISSN: 1558-0806
1549-8328
DOI: 10.1109/tcsi.2021.3060945
Popis: Integrating Deep Neural Networks (DNNs) into the Internet of Thing (IoT) devices could result in the emergence of complex sensing and recognition tasks that support a new era of human interactions with surrounding environments. However, DNNs are power-hungry, performing billions of computations in terms of one inference. Spatial DNN accelerators in principle can support computation-pruning techniques compared to other common architectures such as systolic arrays. Energy-efficient DNN accelerators skip bit-wise or word-wise sparsity in the input feature maps (ifmaps) and filter weights which means ineffectual computations are skipped. However, there is still room for pruning the effectual computations without reducing the accuracy of DNNs. In this paper, we propose a novel real-time architecture and dataflow by decomposing multiplications down to the bit level and pruning identical computations in spatial designs while running benchmark networks. The proposed architecture prunes identical computations by identifying identical bit values available in both ifmaps and filter weights without changing the accuracy of benchmark networks. When compared to the reference design, our proposed design achieves an average per layer speedup of $\times 1.4$ and an energy efficiency of $\times 1.21$ per inference while maintaining the accuracy of benchmark networks.
Databáze: OpenAIRE