3D-VNPU: A Flexible Accelerator for 2D/3D CNNs on FPGA
Autor: | Jian Wang, Zhiyi Yu, Xiangyu Meng, Shanlin Xiao, Huafeng Ye, Huipeng Deng |
---|---|
Rok vydání: | 2021 |
Předmět: |
Computational complexity theory
business.industry Computer science Computation 020208 electrical & electronic engineering 02 engineering and technology Convolutional neural network 020202 computer hardware & architecture Convolution Computational science 0202 electrical engineering electronic engineering information engineering business Field-programmable gate array Digital signal processing Efficient energy use Geometric data analysis |
Zdroj: | FCCM |
DOI: | 10.1109/fccm51124.2021.00029 |
Popis: | Three-dimensional convolutional neural networks (3D CNNs) have proven to be outstanding in applications such as video analysis, 3-dimension geometric data, and 3-dimension medical image diagnosis. Compared to 2D CNNs, 3D CNNs require high computational complexity to get spatio-temporal features while Winograd algorithm can significantly reduce the amount of computation. Prior works based on 3D Winograd accelerators are only applied to stride-1 convolution, however, most of the popular 3D CNNs contain stride-2 convolution layers. In this paper, we propose a novel flexible Winograd-based decomposition method (FWDM) to apply the 3D Winograd to different strides convolution. Evaluation results show that FWDM reduces computational complexity by a factor of 3.2 for C3D, 2.9 for 3D ConvNet, and 2.6 for 3D ResNet-18. Furthermore, we design a flexible computing engine to stretch the use range of the decomposition method. Coupling FWDM and computing engine, a Winograd-based, 2D/3D CNNs compatible, highly efficient, and flexible accelerator (3D-VNPU) is proposed. Finally, we demonstrate the effectiveness of 3D-VNPU on FPGA platform (Xilinx ZCU102) and achieve 1.35TOPS for C3D, 1.2TOPS for 3D ResNet-18, and 1.1TOPS for VGG-16. DSP efficiency outperforms other CNNs accelerators 2.57∼15.3x compared with prior works in FPGA of C3D. Compared to GPU and CPU, our accelerator achieves improvement up to 37.9x in performance relative to CPU and 11.8x in energy efficiency relative to GPU. |
Databáze: | OpenAIRE |
Externí odkaz: |