Bitwise Neural Network Acceleration: Opportunities and Challenges
Autor: | van Lier, Michel, Waeijen, Luc, Corporaal, Henk, Stojanovic, Radovan, Jozwiak, Lech, Lutovac, Budimir, Jurisic, Drazen |
---|---|
Přispěvatelé: | Electronic Systems |
Rok vydání: | 2019 |
Předmět: |
Speedup
Artificial neural network Computer science Memory bandwidth 02 engineering and technology 010501 environmental sciences 01 natural sciences Convolutional neural network 020202 computer hardware & architecture Reduction (complexity) Computer engineering 0202 electrical engineering electronic engineering information engineering Field-programmable gate array Bitwise operation Massively parallel 0105 earth and related environmental sciences |
Zdroj: | MECO 2019 8th Mediterranean Conference on Embedded Computing, MECO 2019-Proceedings |
DOI: | 10.1109/meco.2019.8760178 |
Popis: | Real-time inference of deep convolutional neural networks (CNNs) on embedded systems and SoCs would enable many interesting applications. However these CNNs are computation and data expensive, making it difficult to execute them in real-time on energy constrained embedded platforms. Resent research has shown that light-weight CNNs with quantized model weights and activations constrained to one bit only {-1,+ 1} can still achieve reasonable accuracy, in comparison to the non quantized 32-bit model. These binary neural networks (BNNs) theoretically allow to drastically reduce the required energy and run-time by reduction of memory size, number of memory accesses, and finally computation power by replacing expensive two's complement arithmetic operations with more efficient bitwise versions. To make use of these advantages, we propose a bitwise CNN accelerator (BNNA) mapped on an FPGA. We implement the Hubara'16 network [1] on the Xilinx Zynq-7020 SoC. Massive parallelism is achieved performing 4608 parallel binary MACs in total, which enables us to archive real-time speed up to 110 fps, while using only 22% of the FPGA LUTs. In comparison to a 32-bit network, a speed up of 32 times is achieved, and a resource reduction of 40 times is achieved, where the memory bandwidth is the main bottleneck. The provided detailed analysis of the carefully crafted accelerator design exposes the challenges and opportunities in bitwise neural network accelerator design. |
Databáze: | OpenAIRE |
Externí odkaz: |