Tensor Layout Optimization of Convolution for Inference on Digital Signal Processor
Autor: | Xiaobin Zhang, Xiaoyang Zhang, Guangming Tan, Tian Zhongbo, Junmin Xiao, Zhu Hongrui, Hu Zhongzhe |
---|---|
Rok vydání: | 2019 |
Předmět: |
010302 applied physics
Digital signal processor Artificial neural network business.industry Computer science 02 engineering and technology 01 natural sciences 020202 computer hardware & architecture Convolution Tensor (intrinsic definition) 0103 physical sciences Computer data storage 0202 electrical engineering electronic engineering information engineering Layer (object-oriented design) business Algorithm Digital signal processing 5G |
Zdroj: | ISPA/BDCloud/SocialCom/SustainCom |
DOI: | 10.1109/ispa-bdcloud-sustaincom-socialcom48970.2019.00036 |
Popis: | The development of artificial intelligence and 5G technology has driven Internet of Things into the era of Artificial Intelligence of Things (AIoT). As a part of AIoT, Digital Signal Processor (DSP) plays an important role in many AI chips. It is able to perform quickly in multiply-accumulate calculations. Unlike GPUs which usually apply the general matrix multiply for convolution calculations, DSP uses a direct convolution calculating way. Therefore, most optimization techniques on GPUs are not fit for DSP. In many AIoT scenes, the input and output of convolution layers are usually irregular, which affects the efficiency of data storage. In order to make full use of DSP, this work analyzes the performance effects of different tensor layouts. Based on our analysis, a hybrid tensor layout optimization approach is proposed. Furthermore, the test results check that the accuracy of the empirical criterion for the layout selection is not high. Hence, an automatic tuning approach is designed to accurately choose the suitable layouts for different convolution layers in a neural network. Numerical experiments show that for the convolution calculation of each layer in MobileNet, the hybrid tensor layout optimization approach could achieve 11x speed-up at most compared with the single layout schedules. Furthermore, the accuracy of the automatic approach can reach 99.27% for the ordinary convolution, and the convolution calculation of a whole network under the tensor layouts chosen by the automatic approach achieves a good performance which is 72.8% better than that based on the common layout way. |
Databáze: | OpenAIRE |
Externí odkaz: |