TF-Net
Autor: | Reetuparna Das, Andrew Lukefahr, Jiecao Yu, Scott Mahlke |
---|---|
Rok vydání: | 2019 |
Předmět: |
010302 applied physics
business.industry Computer science Pipeline (computing) Computation Byte Cloud computing 02 engineering and technology 01 natural sciences 020202 computer hardware & architecture Instruction set Microcontroller Computer engineering Hardware and Architecture 0103 physical sciences 0202 electrical engineering electronic engineering information engineering Overhead (computing) business Software Efficient energy use |
Zdroj: | ACM Transactions on Embedded Computing Systems. 18:1-21 |
ISSN: | 1558-3465 1539-9087 |
DOI: | 10.1145/3358189 |
Popis: | Deep Neural Networks (DNNs) have become an essential component of various applications. While today’s DNNs are mainly restricted to cloud services, network connectivity, energy, and data privacy problems make it important to support efficient DNN computation on low-cost, low-power processors like microcontrollers. However, due to the constrained computation resources, it is challenging to execute large DNN models on microcontrollers. Using sub-byte low-precision input activations and weights is a typical method to reduce DNN computation. But on byte-addressable microcontrollers, the sub-byte computation is not well supported. The sub-byte inputs and weights need to be unpacked from bitstreams before computation, which incurs significant computation and energy overhead. In this paper, we propose the TF-Net pipeline to efficiently deploy sub-byte DNNs on microcontrollers. While TF-Net allows for a range of weight and input precision, we find Ternary weights and Four-bit inputs provide the optimal balance between model accuracy, computation performance, and energy efficiency. TF-Net first includes a training framework for sub-byte low-precision DNN models. Two algorithms are then introduced to accelerate the trained models. The first, direct buffer convolution, amortizes unpacking overhead by caching unpacked inputs. The second, packed sub-byte multiply-accumulate, utilizes a single multiplication instruction to perform multiple sub-byte multiply-accumulate computations. To further accelerate DNN computation, we propose two instructions, Multiply-Shift-Accumulate and Unpack, to extend the existing microcontroller instruction set. On the tested networks, TF-Net can help improve the computation performance and energy efficiency by 1.83× and 2.28× on average, respectively. |
Databáze: | OpenAIRE |
Externí odkaz: |