CASCADE

Autor:	Anuj Pathania, Dhananjaya Wijerathne, Zhaoying Li, Manupa Karunarathne, Tulika Mitra
Rok vydání:	2019
Předmět:	Computer science Computation Parallel computing computer.software_genre Memory address Memory bank Data access Hardware and Architecture Cascade Compiler Throughput (business) computer Software Performance per watt
Zdroj:	ACM Transactions on Embedded Computing Systems. 18:1-26
ISSN:	1558-3465 1539-9087
Popis:	A Coarse-Grained Reconfigurable Array (CGRA) is a promising high-performance low-power accelerator for compute-intensive loop kernels. While the mapping of the computations on the CGRA is a well-studied problem, bringing the data into the array at a high throughput remains a challenge. A conventional CGRA design involves on-array computations to generate memory addresses for data access undermining the attainable throughput. A decoupled access-execute architecture, on the other hand, isolates the memory access from the actual computations resulting in a significantly higher throughput. We propose a novel decoupled access-execute CGRA design called CASCADE with full architecture and compiler support for high-throughput data streaming from an on-chip multi-bank memory. CASCADE offloads the address computations for the multi-bank data memory access to a custom designed programmable hardware. An end-to-end fully-automated compiler synchronizes the conflict-free movement of data between the memory banks and the CGRA. Experimental evaluations show on average 3× performance benefit and 2.2× performance per watt improvement for CASCADE compared to an iso-area conventional CGRA with a bigger processing array in lieu of a dedicated hardware memory address generation logic.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::ef4f08ec75f73d51f90c20ddbce8f580 https://doi.org/10.1145/3358177 Zobrazit plný text záznamu