Hardware Based Loop Optimization for CGRA Architectures

Autor: Kevin Martin, Chilankamol Sunny, Satyajit Das, Philippe Coussy
Přispěvatelé: Indian Institut of Technology [Palakkad] (ITT Palakkad), Equipe Hardware ARchitectures and CAD tools (Lab-STICC_ARCAD), Laboratoire des sciences et techniques de l'information, de la communication et de la connaissance (Lab-STICC), École Nationale d'Ingénieurs de Brest (ENIB)-Université de Bretagne Sud (UBS)-Université de Brest (UBO)-École Nationale Supérieure de Techniques Avancées Bretagne (ENSTA Bretagne)-Institut Mines-Télécom [Paris] (IMT)-Centre National de la Recherche Scientifique (CNRS)-Université Bretagne Loire (UBL)-IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-École Nationale d'Ingénieurs de Brest (ENIB)-Université de Bretagne Sud (UBS)-Université de Brest (UBO)-École Nationale Supérieure de Techniques Avancées Bretagne (ENSTA Bretagne)-Institut Mines-Télécom [Paris] (IMT)-Centre National de la Recherche Scientifique (CNRS)-Université Bretagne Loire (UBL)-IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT), Université de Bretagne Sud - Lorient (UBS Lorient), Université de Bretagne Sud (UBS)
Jazyk: angličtina
Rok vydání: 2021
Předmět:
Zdroj: Applied Reconfigurable Computing. Architectures, Tools, and Applications
Applied Reconfigurable Computing. Architectures, Tools, and Applications, Jun 2021, Rennes, France. pp.65-80, ⟨10.1007/978-3-030-79025-7_5⟩
Applied Reconfigurable Computing. Architectures, Tools, and Applications ISBN: 9783030790240
ARC
DOI: 10.1007/978-3-030-79025-7_5⟩
Popis: With the increasing demand for high performance computing in application domains with stringent power budgets, coarse-grained reconfigurable array (CGRA) architectures have become a popular choice among researchers and manufacturers. Loops are the hot-spots of kernels running on CGRAs and hence several techniques have been devised to optimize the loop execution. However, works in this direction are predominantly software-based solutions. This paper addresses the optimization opportunities at a deeper level and introduces a hardware based loop control mechanism that can support arbitrarily nested loops up to four levels. Major contributions of this work are, a lightweight Hardware Loop Block (HLB) for CGRAs that eliminates control instruction overhead of loops and an acyclic graph transformation that removes loop branches from the application CDFG. When tested on a set of kernels chosen from various application domains, the design could achieve a maximum of 1.9\(\times \) and an average of 1.5\(\times \) speed-up against the conventional approach. The total number of instructions executed is reduced to half for almost all the kernels with an area and power consumption overhead of 2.6% and 0.8% respectively.
Databáze: OpenAIRE