Zobrazeno 1 - 10
of 50
pro vyhledávání: '"Yuanwu Lei"'
Publikováno v:
IEEE Access, Vol 7, Pp 5808-5818 (2019)
Matrix transposition plays a critical role in digital signal processing. However, the existing matrix transposition implementations have significant limitations. A traditional design uses load and store instructions to accomplish matrix transposition
Externí odkaz:
https://doaj.org/article/24d84e77c4a747d086bdd95eeb273dd3
Autor:
Kai Lu, Yaohua Wang, Yang Guo, Chun Huang, Sheng Liu, Ruibo Wang, Jianbin Fang, Tao Tang, Zhaoyun Chen, Biwei Liu, Zhong Liu, Yuanwu Lei, Haiyan Sun
Publikováno v:
CCF Transactions on High Performance Computing. 4:150-164
Publikováno v:
CCF Transactions on High Performance Computing. 3:114-125
Digital Signal Processors (DSPs) have been widely used in embedded domains, delivering high performance with ultra-low power consumption. Such promises make it attractive for more domains that DSP was not an option before. To show how DSP lives up to
Publikováno v:
Chinese Journal of Electronics. 28:1158-1164
Range reduction is the initial and essential stage of function computation, but its pipelined implementation has the drawbacks of large cost and terrible accuracy. We proposed low cost and accurate pipelined range reduction, which adopts truncated mu
Publikováno v:
IEEE Access, Vol 7, Pp 5808-5818 (2019)
Matrix transposition plays a critical role in digital signal processing. However, the existing matrix transposition implementations have significant limitations. A traditional design uses load and store instructions to accomplish matrix transposition
Publikováno v:
IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 26:1953-1966
Fast Fourier transform (FFT) is the kernel and the most time-consuming algorithm in the domain of digital signal processing, and the FFT sizes of different applications are very different. Therefore, this paper proposes a variable-size FFT hardware a
Publikováno v:
IEEE Transactions on Circuits and Systems I: Regular Papers. 64:892-905
CORDIC algorithm is suitable to implement sine/cosine function, but the large number of iterations lead to great delay and overhead. Moreover, due to finite bit-width of operands and number of iterations, the relative error of floating-point sine or
Publikováno v:
Chinese Journal of Electronics. 26:292-298
Focused on the issue that division is complex and needs a long latency to compute, a method to design the unit of high-performance Floating-point (FP) divider based on Goldschmidt algorithm was proposed. Bipartite reciprocal tables were adopted to ob
Publikováno v:
HPCC/SmartCity/DSS
The Matrix2 Accelerator is a high-performance multi-core vector processor for high-density computing that supports fused multiply-add instructions. We propose an efficient large-scale 1D FFT vectorization method according to the architecture characte
Publikováno v:
ISCAS
We design an efficient DMA controller for scientific computing accelerators. It supports several flexible and powerful transfers, including reshape transfers, parameter linking mechanism, and transfer chaining meachnism. We also optimize the DMA cont