High performance deep neural network on low cost mobile GPU

Autor: Alex Pai, Chien-Ping Lu, Shu-Jen Lai, Sung-Fang Tsai, Pei-Kuei Tsung
Rok vydání: 2016
Předmět:
Zdroj: ICCE
DOI: 10.1109/icce.2016.7430525
Popis: In recent years, machine learning based on deep neural networks (DNN) is playing an increasingly important role. Artificial intelligence applications using DNN are achieving higher and higher accuracy levels. However, the multi-layer characteristic of a DNN makes for huge computational complexity consumption requirements. In order to feasibly run DNN applications on mobile devices, an efficient DNN flow optimized for a mobile GPU is desired. In this paper, a mobile-GPU-accelerated DNN flow is proposed. By the proposed input buffer address remapping scheme, shader assembly code optimization and kernel merging between computing nodes, 10.6 FPS is achieved in a 35.2 GFLOPS mobile GPU with 94.9mJ per frame, which is a 58x speed up and a 104x more energy efficient compared to a pure mobile CPU solution. Compared with state-of-the-art GPU accelerator devices and libraries, the proposed scheme provides a 226%∼1000% higher computing efficiency.
Databáze: OpenAIRE