Inter-tile reuse optimization applied to bandwidth constrained embedded accelerators

Autor:	Maurice Peemen, Bart Mesman, Henk Corporaal
Přispěvatelé:	Electronic Systems
Jazyk:	angličtina
Rok vydání:	2015
Zdroj:	Proceedings of the Conference on Design, Automation and Test in Europe, DATE, 9-13 March 2015, Grenoble, France, 169-174 STARTPAGE=169;ENDPAGE=174;TITLE=Proceedings of the Conference on Design, Automation and Test in Europe, DATE, 9-13 March 2015, Grenoble, France
Popis:	The adoption of High-Level Synthesis (HLS) tools has significantly reduced accelerator design time. A complex scaling problem that remains is the data transfer bottleneck. To scale-up performance accelerators require huge amounts of data, and are often limited by interconnect resources. In addition, the energy spent by the accelerator is often dominated by the transfer of data, either in the form of memory references or data movement on interconnect. In this paper we drastically reduce accelerator communication by exploration of computation reordering and local buffer usage. Consequently, we present a new analytical methodology to optimize nested loops for inter-tile data reuse with loop transformations like interchange and tiling. We focus on embedded accelerators that can be used in a multi-accelerator System on Chip (SoC), so performance, area, and energy are key in this exploration. 1) On three common embedded applications in the image/video processing domain (demosaicing, block matching, object detection), we show that our methodology reduces data movement up to 2.1x compared to the best case of intra-tile optimization. 2) We demonstrate that our small accelerators (1-3% FPGA resources) can boost a simple MicroBlaze soft-core to the performance level of a high-end Intel-i7 processor.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::afbf5b7b730fa782277b50794f00bd31 https://research.tue.nl/nl/publications/fac955c8-d7be-4fa3-b043-622d55def44f Zobrazit plný text záznamu