Highly Efficient Sparse Neural Network Computing

Autor:	Xu Liewei, Feng Zhenhao, Tieli Sun, Chen Pan, Yanjie Gu, Jian Yu, Chang Wu
Rok vydání:	2019
Předmět:	Hardware architecture Software Speedup Artificial neural network business.industry Computer science Hardware acceleration Pruning (decision trees) business Field-programmable gate array Multiplexer Computer hardware
Zdroj:	FPGA
DOI:	10.1145/3289602.3293952
Popis:	In this study, we propose a software-hardware combined solution for efficient sparse neural network computing. Much of the connections between each layers are pruned in sparse neural network. Usually the weights are in compressed format, but the corresponding feature map data need to be pared before passing to computation engine. Since the compressed weights require indirect memory access, there needs a large amount of multiplexers to locate the data position. Motivated by this, we propose a new architecture with a much smaller data selection multiplexer design. In our hardware architecture, the data are selected in a smaller range so that the scale of multiplexer can be reduced. This is paired with our software network pruning method. Compared with the structured or pattern-based pruning method, our algorithm does not impose such restriction and just ensure that there are same numbers of non-zero elements in each z-channel array of the weights. The non-zero elements can be distributed at any position in the array. We also use dual channel for better efficiency on data scheduling. Our experimental results show that our architecture can reach 3x overall speedup for 25% sparsity networks when compared with non-sparse engines with the same amount of computing resources. In the future, we plan to further improve our pruning algorithm, and tape out our hardware design.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::06a4067a40217808e040f8860bf46963 https://doi.org/10.1145/3289602.3293952 Zobrazit plný text záznamu