GEMM-Based Quantized Neural Network FPGA Accelerator Design

Autor:	Muhammad Rifqi Daffa Sudrajat, Trio Adiono, Infall Syafalni
Rok vydání:	2019
Předmět:	Artificial neural network Computer engineering business.industry Computer science Deep learning Bandwidth (signal processing) Scalability Inference Artificial intelligence Modular design business Field-programmable gate array MNIST database
Zdroj:	2019 International Symposium on Electronics and Smart Devices (ISESD).
DOI:	10.1109/isesd.2019.8909538
Popis:	In this study, we will explore Neural Network based FPGA acceleration based on accelerating General Matrix Multiplication (GEMM). GEMM acceleration allows regularized and modular implementation of accelerator design, as well as providing the benefits of scalability. GEMM based designs also offer a degree of functional flexibility which is a key benefit to understand the highly dynamic architectural developments in Deep Learning algorithms. We quantify the theoretical performance model and tradeoffs of a GEMM accelerator along with exploration of the design space. Moreover, we propose a design for an accelerator exploiting 8-bit quantization to increase bandwidth while preserving model accuracy, exploiting FPGAs for model parallelization and data re-use for high performance and low latency neural network inference. Lastly, we test and evaluate our design on the MNIST dataset. The proposed method is useful to optimize the hardware area in Deep Learning systems without sacrificing performance.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::fa4740ee4982837f7f5b616f6e6bb618 https://doi.org/10.1109/isesd.2019.8909538 Zobrazit plný text záznamu