Efficient cuDNN-Compatible Convolution-Pooling on the GPU

Autor: Shunsuke Suita, Akihiko Kasagi, Takahiro Nishimura, Tsuguchika Tabaru, Yasuaki Ito, Hiroki Tokura, Koji Nakano
Rok vydání: 2020
Předmět:
Zdroj: Parallel Processing and Applied Mathematics ISBN: 9783030432218
PPAM (2)
Popis: The main contribution of this paper is to show efficient implementations of the convolution-pooling in the GPU, in which the pooling follows the multiple convolution. Since the multiple convolution and the pooling operations are performed alternately in earlier stages of many Convolutional Neural Networks (CNNs), it is very important to accelerate the convolution-pooling. Our new GPU implementation uses two techniques, (1) convolution interchange with direct sum, and (2) conversion to matrix multiplication. By these techniques, the computational and memory access cost are reduced. Further the convolution interchange is converted to matrix multiplication, which can be computed by cuBLAS very efficiently. Experimental results using Telsa V100 GPU show that our new GPU implementation compatible with cuDNN for the convolution-pooling is at least 1.34 times faster than the multiple convolution and then the pooling by cuDNN, the most popular library of primitives to implement the CNNs in the GPU.
Databáze: OpenAIRE