Basic Linear Algebra Operations on TensorCore GPU

Autor: Vivek Karihaloo, Shaoshuai Zhang, Panruo Wu
Rok vydání: 2020
Zdroj: ScalA@SC
Popis: Encouraged by the requirement of high speed matrix computations and training deep neural networks, TensorCore was introduced in NVIDIA GPU to further accelerate matrix-matrix multiplication. It supports very fast half precision general matrix matrix multiplications (GEMMs), which is around 8x faster than single precision CUDA core GEMMs. So far the use of TensorCore GPU for matrix operations other than matrix-matrix multiplications is under developed. In this paper, we propose some efficient BLAS3 operations that exploits TensorCore. The experimental results show that the proposed algorithms outperform cublas corresponding routines and the naive TensorCore implementation with up to 4.7× speedup.
Databáze: OpenAIRE