Stride 2 1-D, 2-D, and 3-D Winograd for Convolutional Neural Networks
Autor: | Seok-Bum Ko, Juan Yepez |
---|---|
Rok vydání: | 2020 |
Předmět: | |
Zdroj: | IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 28:853-863 |
ISSN: | 1557-9999 1063-8210 |
DOI: | 10.1109/tvlsi.2019.2961602 |
Popis: | Convolutional neural networks (CNNs) have been widely adopted for computer vision applications. CNNs require many multiplications, making their use expensive in terms of both computational complexity and hardware. An effective method to mitigate the number of required multiplications is via the Winograd algorithm. Previous implementations of CNNs based on Winograd use the 2-D algorithm $F(2 \times 2,3 \times 3)$ , which reduces computational complexity by a factor of 2.25 over regular convolution. However, current Winograd implementations only apply when using a stride (shift displacement of a kernel over an input) of 1. In this article, we presented a novel method to apply the Winograd algorithm to a stride of 2. This method is valid for one, two, or three dimensions. We also introduced new Winograd versions compatible with a kernel of size 3, 5, and 7. The algorithms were successfully implemented on an NVIDIA K20c GPU. Compared to regular convolutions, the implementations for stride 2 are 1.44 times faster for a $3 \times 3$ kernel, $2.04\times $ faster for a $5\times 5$ kernel, $2.42\times $ faster for a $7 \times 7$ kernel, and $1.73\times $ faster for a $3 \times 3 \times 3$ kernel. Additionally, a CNN accelerator using a novel processing element (PE) performs two 2-D Winograd stride 1, or one 2-D Winograd stride 2, and operations per clock cycle was implemented on an Intel Arria-10 field-programmable gate array (FPGA). We accelerated the original and our proposed modified VGG-16 architectures and achieved digital signal processor (DSP) efficiencies of 1.22 giga operations per second (GOPS)/DSPs and 1.33 GOPS/DSPs, respectively. |
Databáze: | OpenAIRE |
Externí odkaz: |