A Low-Power Transprecision Floating-Point Cluster for Efficient Near-Sensor Data Analytics
Autor: | Davide Rossi, Stefan Mach, Luca Benini, Simone Benatti, Angelo Garofalo, Fabio Montagna, Gianmarco Ottavi, Giuseppe Tagliavini |
---|---|
Přispěvatelé: | Montagna F., Mach S., Benatti S., Garofalo A., Ottavi G., Benini L., Rossi D., Tagliavini G. |
Rok vydání: | 2022 |
Předmět: |
020203 distributed computing
sub-word vectorization Floating point parallel computing Computer science near-sensor computing RISC-V transprecision 02 engineering and technology Energy consumption Power budget Toolchain Computational Theory and Mathematics Computer engineering Hardware and Architecture Computer cluster Signal Processing Vectorization (mathematics) 0202 electrical engineering electronic engineering information engineering Programming paradigm FPU interconnect |
Zdroj: | IEEE Transactions on Parallel and Distributed Systems IEEE Transactions on Parallel and Distributed Systems, 33 (5) |
ISSN: | 2161-9883 1045-9219 |
Popis: | Recent applications in low-power (1-20 mW) near-sensor computing require the adoption of floating-point arithmetic to reconcile high precision results with a wide dynamic range. In this article, we propose a low-power multi-core computing cluster that leverages the fined-grained tunable principles of transprecision computing to provide support to near-sensor applications at a minimum power budget. Our solution - based on the open-source RISC-V architecture - combines parallelization and sub-word vectorization with a dedicated interconnect design capable of sharing floating-point units (FPUs) among the cores. On top of this architecture, we provide a full-fledged software stack support, including a parallel low-level runtime, a compilation toolchain, and a high-level programming model, with the aim to support the development of end-to-end applications. We performed an exhaustive exploration of the design space of the transprecision cluster on a cycle-accurate FPGA emulator, varying the number of cores and FPUs to maximize performance. Orthogonally, we performed a vertical exploration to identify the most efficient solutions in terms of non-functional requirements (operating frequency, power, and area). We conducted an experimental assessment on a set of benchmarks representative of the near-sensor processing domain, complementing the timing results with a post place-&-route analysis of the power consumption. A comparison with the state-of-the-art shows that our solution outperforms the competitors in energy efficiency, reaching a peak of 97 Gflop/s/W on single-precision scalars and 162 Gflop/s/W on half-precision vectors. Finally, a real-life use case demonstrates the effectiveness of our approach in fulfilling accuracy constraints. ISSN:1045-9219 ISSN:1558-2183 ISSN:2161-9883 |
Databáze: | OpenAIRE |
Externí odkaz: |