Popis: |
In this study, we propose a software-hardware combined solution for efficient sparse neural network computing. Much of the connections between each layers are pruned in sparse neural network. Usually the weights are in compressed format, but the corresponding feature map data need to be pared before passing to computation engine. Since the compressed weights require indirect memory access, there needs a large amount of multiplexers to locate the data position. Motivated by this, we propose a new architecture with a much smaller data selection multiplexer design. In our hardware architecture, the data are selected in a smaller range so that the scale of multiplexer can be reduced. This is paired with our software network pruning method. Compared with the structured or pattern-based pruning method, our algorithm does not impose such restriction and just ensure that there are same numbers of non-zero elements in each z-channel array of the weights. The non-zero elements can be distributed at any position in the array. We also use dual channel for better efficiency on data scheduling. Our experimental results show that our architecture can reach 3x overall speedup for 25% sparsity networks when compared with non-sparse engines with the same amount of computing resources. In the future, we plan to further improve our pruning algorithm, and tape out our hardware design. |