A Variable-Size FFT Hardware Accelerator Based on Matrix Transposition
Autor: | Xiaowen Chen, Yuanwu Lei, Shuming Chen, Zhonghai Lu |
---|---|
Rok vydání: | 2018 |
Předmět: |
Very-large-scale integration
business.industry Computer science 020208 electrical & electronic engineering Fast Fourier transform Bandwidth (signal processing) 02 engineering and technology Parallel computing 020202 computer hardware & architecture Hardware and Architecture Transpose 0202 electrical engineering electronic engineering information engineering Hardware acceleration Electrical and Electronic Engineering business Field-programmable gate array Software Digital signal processing Twiddle factor |
Zdroj: | IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 26:1953-1966 |
ISSN: | 1557-9999 1063-8210 |
Popis: | Fast Fourier transform (FFT) is the kernel and the most time-consuming algorithm in the domain of digital signal processing, and the FFT sizes of different applications are very different. Therefore, this paper proposes a variable-size FFT hardware accelerator, which fully supports the IEEE-754 single-precision floating-point standard and the FFT calculation with a wide size range from 2 to 220 points. First, a parallel Cooley–Tukey FFT algorithm based on matrix transposition (MT) is proposed, which can efficiently divide a large size FFT into several small size FFTs that can be executed in parallel. Second, guided by this algorithm, the FFT hardware accelerator is designed, and several FFT performance optimization techniques such as hybrid twiddle factor generation, multibank data memory, block MT, and token-based task scheduling are proposed. Third, its VLSI implementation is detailed, showing that it can work at 1 GHz with the area of 2.4 mm2 and the power consumption of 91.3 mW at 25 °C, 0.9 V. Finally, several experiments are carried out to evaluate the proposal’s performance in terms of FFT execution time, resource utilization, and power consumption. Comparative experiments show that our FFT hardware accelerator achieves at most $18.89\times $ speedups in comparison to two software-only solutions and two hardware-dedicated solutions. |
Databáze: | OpenAIRE |
Externí odkaz: |