Výsledky vyhledávání

Akademický článek

MT-DMA: A DMA Controller Supporting Efficient Matrix Transposition for Digital Signal Processing

Autor: Sheng Ma, Yuanwu Lei, Libo Huang, Zhiying Wang

Publikováno v: IEEE Access, Vol 7, Pp 5808-5818 (2019)

Matrix transposition plays a critical role in digital signal processing. However, the existing matrix transposition implementations have significant limitations. A traditional design uses load and store instructions to accomplish matrix transposition

Externí odkaz: https://doaj.org/article/24d84e77c4a747d086bdd95eeb273dd3

Zobrazit plný text záznamu

MT-3000: a heterogeneous multi-zone processor for HPC

Autor: Kai Lu, Yaohua Wang, Yang Guo, Chun Huang, Sheng Liu, Ruibo Wang, Jianbin Fang, Tao Tang, Zhaoyun Chen, Biwei Liu, Zhong Liu, Yuanwu Lei, Haiyan Sun

Publikováno v: CCF Transactions on High Performance Computing. 4:150-164

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::76167d126979d58c10b18801334cc345
https://doi.org/10.1007/s42514-022-00095-y

Zobrazit plný text záznamu

Advancing DSP into HPC, AI, and beyond: challenges, mechanisms, and future directions

Autor: Yang Guo, Liu Chang, Sheng Liu, Zhang Yang, Chen Li, Yuanwu Lei, Yaohua Wang, Jian Zhang

Publikováno v: CCF Transactions on High Performance Computing. 3:114-125

Digital Signal Processors (DSPs) have been widely used in embedded domains, delivering high performance with ultra-low power consumption. Such promises make it attractive for more domains that DSP was not an option before. To show how DSP lives up to

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::a9259360652d025c3e6d179ed569b912
https://doi.org/10.1007/s42514-020-00057-2

Zobrazit plný text záznamu

Pipelined Range Reduction Based Truncated Multiplier

Autor: Yuanwu Lei, Zhu Baozhou, Yuanxi Peng

Publikováno v: Chinese Journal of Electronics. 28:1158-1164

Range reduction is the initial and essential stage of function computation, but its pipelined implementation has the drawbacks of large cost and terrible accuracy. We proposed low cost and accurate pipelined range reduction, which adopts truncated mu

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::30d9d35f6e91ee39c38f54defe622f78
https://doi.org/10.1049/cje.2019.07.003

Zobrazit plný text záznamu

MT-DMA: A DMA Controller Supporting Efficient Matrix Transposition for Digital Signal Processing

Autor: Zhiying Wang, Yuanwu Lei, Libo Huang, Sheng Ma

Publikováno v: IEEE Access, Vol 7, Pp 5808-5818 (2019)

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::ee6d61b1e4b385e06e70584e55884832
https://ieeexplore.ieee.org/document/8587161/

Zobrazit plný text záznamu

A Variable-Size FFT Hardware Accelerator Based on Matrix Transposition

Autor: Xiaowen Chen, Yuanwu Lei, Shuming Chen, Zhonghai Lu

Publikováno v: IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 26:1953-1966

Fast Fourier transform (FFT) is the kernel and the most time-consuming algorithm in the domain of digital signal processing, and the FFT sizes of different applications are very different. Therefore, this paper proposes a variable-size FFT hardware a

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::12e2cb9f442c55d814ea2a174c455953
https://doi.org/10.1109/tvlsi.2018.2846688

Zobrazit plný text záznamu

Low Latency and Low Error Floating-Point Sine/Cosine Function Based TCORDIC Algorithm

Autor: Yuanwu Lei, Yuanxi Peng, Tingting He, Zhu Baozhou

Publikováno v: IEEE Transactions on Circuits and Systems I: Regular Papers. 64:892-905

CORDIC algorithm is suitable to implement sine/cosine function, but the large number of iterations lead to great delay and overhead. Moreover, due to finite bit-width of operands and number of iterations, the relative error of floating-point sine or

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::64ac11b36df5c133511b283fd71e24ce
https://doi.org/10.1109/tcsi.2016.2631588

Zobrazit plný text záznamu

High‐Performance FP Divider with Sharing Multipliers Based on Goldschmidt Algorithm

Autor: Tingting He, Yuanxi Peng, Jiyang Chen, Zhu Baozhou, Yuanwu Lei

Publikováno v: Chinese Journal of Electronics. 26:292-298

Focused on the issue that division is complex and needs a long latency to compute, a method to design the unit of high-performance Floating-point (FP) divider based on Goldschmidt algorithm was proposed. Bipartite reciprocal tables were adopted to ob

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::e4007246b37545f9d1f54ad19283fe6b
https://doi.org/10.1049/cje.2016.10.004

Zobrazit plný text záznamu

Efficient Large-Scale 1D FFT Vectorization on Multi-Core Vector Accelerator

Autor: Zhong Liu, Xi Tian, Xiaowen Chen, Yuanwu Lei, Man Liao

Publikováno v: HPCC/SmartCity/DSS

The Matrix2 Accelerator is a high-performance multi-core vector processor for high-density computing that supports fused multiply-add instructions. We propose an efficient large-scale 1D FFT vectorization method according to the architecture characte

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::d93427e8972727fb983f908530e98747
https://doi.org/10.1109/hpcc/smartcity/dss.2019.00078

Zobrazit plný text záznamu

An Efficient Direct Memory Access (DMA) Controller for Scientific Computing Accelerators

Autor: Libo Huang, Yang Guo, Sheng Ma, Yuanwu Lei, Zhiying Wang

Publikováno v: ISCAS

We design an efficient DMA controller for scientific computing accelerators. It supports several flexible and powerful transfers, including reshape transfers, parameter linking mechanism, and transfer chaining meachnism. We also optimize the DMA cont

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::0120f4795ccac039675f27b6465fdba4
https://doi.org/10.1109/iscas.2019.8702172

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání