Toward an Efficient Deep Pipelined Template-Based Architecture for Accelerating the Entire 2-D and 3-D CNNs on FPGA.

Autor:	Shen, Junzhong, Huang, You, Wen, Mei, Zhang, Chunyuan
Předmět:	CONVOLUTIONAL neural networks GRAPHICS processing units COMPUTER vision FIELD programmable gate arrays APPLICATION software
Zdroj:	IEEE Transactions on Computer-Aided Design of Integrated Circuits & Systems; Jul2020, Vol. 39 Issue 2, p1442-1455, 14p
Abstrakt:	3-D convolutional neural networks (3-D CNNs) are used efficiently in many computer vision applications. Most previous work in this area has concentrated only on design and optimization of accelerators for 2-D CNNs, with few attempts having been made to accelerate 3-D CNNs on FPGA. We find the acceleration of 3-D CNNs on FPGA to be challenging due to their high computational complexity and storage demands. More importantly, although the computational patterns of 2-D and 3-D CNNs are analogous, the conventional approaches that have been adopted for acceleration of 2-D CNNs may be unfit for 3-D CNN acceleration. In this paper, in order to accelerate 2-D and 3-D CNNs using a uniform framework, we first propose a uniform template-based architecture that uses templates based on the Winograd algorithm to ensure the rapid development of 2-D and 3-D CNN accelerators. Then, with the aim of efficiently mapping all layers of 2-D /3-D CNNs onto a pipelined accelerator, techniques are developed to improve the throughput and computational efficiency of the accelerator, including layer fusion, layer clustering, and workload-balancing scheme. Finally, we demonstrate the effectiveness of the deep pipelined architecture by accelerating real-life 2-D and 3-D CNNs on the state-of-the-art FPGA platform. On VCU118, we achieve 3.7 TOPS for VGG-16, which outperforms state-of-the-art FPGA-based CNN accelerators. Comparisons with CPU and GPU solutions demonstrate that our implementation of 3-D CNN achieves gains of up to $17.8\times $ and $64.2\times $ in performance and energy relative to a CPU solution, and a $5.0\times $ energy efficiency gain over a GPU solution. [ABSTRACT FROM AUTHOR]
Databáze:	Complementary Index
Externí odkaz:	Zobrazit plný text záznamu