Embedded Streaming Deep Neural Networks Accelerator With Applications
Autor: | Eugenio Culurciello, Berin Martini, Jonghoon Jin, Aysegul Dundar |
---|---|
Rok vydání: | 2017 |
Předmět: |
Artificial neural network
Computer Networks and Communications business.industry Computer science Real-time computing 02 engineering and technology 010501 environmental sciences computer.software_genre 01 natural sciences Convolutional neural network Object detection Computer Science Applications Artificial Intelligence Gate array Embedded system 0202 electrical engineering electronic engineering information engineering Hardware acceleration 020201 artificial intelligence & image processing Compiler business Field-programmable gate array computer Throughput (business) Software 0105 earth and related environmental sciences |
Zdroj: | IEEE Transactions on Neural Networks and Learning Systems. 28:1572-1583 |
ISSN: | 2162-2388 2162-237X |
DOI: | 10.1109/tnnls.2016.2545298 |
Popis: | Deep convolutional neural networks (DCNNs) have become a very powerful tool in visual perception. DCNNs have applications in autonomous robots, security systems, mobile phones, and automobiles, where high throughput of the feedforward evaluation phase and power efficiency are important. Because of this increased usage, many field-programmable gate array (FPGA)-based accelerators have been proposed. In this paper, we present an optimized streaming method for DCNNs' hardware accelerator on an embedded platform. The streaming method acts as a compiler, transforming a high-level representation of DCNNs into operation codes to execute applications in a hardware accelerator. The proposed method utilizes maximum computational resources available based on a novel-scheduled routing topology that combines data reuse and data concatenation. It is tested with a hardware accelerator implemented on the Xilinx Kintex-7 XC7K325T FPGA. The system fully explores weight-level and node-level parallelizations of DCNNs and achieves a peak performance of 247 G-ops while consuming less than 4 W of power. We test our system with applications on object classification and object detection in real-world scenarios. Our results indicate high-performance efficiency, outperforming all other presented platforms while running these applications. |
Databáze: | OpenAIRE |
Externí odkaz: |