GEMM-Based Quantized Neural Network FPGA Accelerator Design

Autor: Muhammad Rifqi Daffa Sudrajat, Trio Adiono, Infall Syafalni
Rok vydání: 2019
Předmět:
Zdroj: 2019 International Symposium on Electronics and Smart Devices (ISESD).
DOI: 10.1109/isesd.2019.8909538
Popis: In this study, we will explore Neural Network based FPGA acceleration based on accelerating General Matrix Multiplication (GEMM). GEMM acceleration allows regularized and modular implementation of accelerator design, as well as providing the benefits of scalability. GEMM based designs also offer a degree of functional flexibility which is a key benefit to understand the highly dynamic architectural developments in Deep Learning algorithms. We quantify the theoretical performance model and tradeoffs of a GEMM accelerator along with exploration of the design space. Moreover, we propose a design for an accelerator exploiting 8-bit quantization to increase bandwidth while preserving model accuracy, exploiting FPGAs for model parallelization and data re-use for high performance and low latency neural network inference. Lastly, we test and evaluate our design on the MNIST dataset. The proposed method is useful to optimize the hardware area in Deep Learning systems without sacrificing performance.
Databáze: OpenAIRE