Neural architecture search, memristive crossbars, non-idealities, adversarial robustness, EDAP.

Autor: LEÓN-VEGA, LUIS G., SALAZAR-VILLALOBOS, EDUARDO, RODRIGUEZ-FIGUEROA, ALEJANDRO, CASTRO-GODÍNEZ, JORGE
Předmět:
Zdroj: ACM Transactions on Embedded Computing Systems; Jul2023, Vol. 22 Issue 4, p1-27, 27p
Abstrakt: Low-power consumption and scarce computational resources limit the computation at the edge. Besides, the approximate computing paradigm reports promising techniques for designing accelerators to deal with inherent limitations of the edge, and high-level synthesis with C++ opens the opportunity to use metaprogramming for specialisable generic design. This work proposes a framework for automatically generating synthesis-time configurable processing elements (PEs) for matrix multiplication-addition (GEMMA) and convolution. To evaluate our work, we perform a design exploration after varying data bit-width, operand sizes, and kernel sizes. Our analyses include resource consumption scaling, clocks-to-solution, design efficiency, and error distribution, presenting a comprehensive view of how the parameters affect the properties of our generic implementations. The GEMMA presented a trade-off between granularity vs efficiency, where large PEs with short data widths are favoured by the design efficiency, achieving, theoretically, up to 75 GMAC/s on a Xilinx XC7Z020 @ 100 MHz with an efficiency of 27%. For design efficiency, we propose a figure of merit to evaluate operations per second and resource utilisation with respect to the maximum achievable by the FPGA. Regarding the convolution PEs, we implemented two algorithms: a window-based spatial convolution and Winograd. The former is the best in terms of performance with 150 GMAC/s, reaching up to 47% of efficiency. Winograd also outperformed numerically using a 3 × 3 kernel filter, presenting a mean error of 11.01% in 4-bits operands with a PSNR = 16.28 dB, compared to the spatial convolution with 38.2% of mean error and PSNR = 5.89 dB. Finally, we discuss how the error is mostly dependent on the PE’s parameters. In the GEMMA, the error depends on the matrix size, causing limitations in the PE scaling but still applicable to accelerators. The PEs developed during this research will lead to further granular approximate accelerator research. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index