Run fast when you can: Loop pipelining with uncertain and non-uniform memory dependencies

Autor:	John Wickerson, Junyi Liu, George A. Constantinides, Samuel Bayliss
Přispěvatelé:	Engineering & Physical Science Research Council (E, Royal Academy Of Engineering, Imagination Technologies Ltd, Engineering & Physical Science Research Council (EPSRC)
Rok vydání:	2017
Předmět:	Technology Schedule Science & Technology Computer Science Information Systems Computer science Pipeline (computing) Engineering Electrical & Electronic 02 engineering and technology Parallel computing 020202 computer hardware & architecture Loop splitting Variable (computer science) Engineering Computer Science Telecommunications 0202 electrical engineering electronic engineering information engineering Polytope model Overhead (computing)
Zdroj:	ACSSC 52nd Annual Asilomar Conference on Signals, Systems, and Computers
DOI:	10.1109/acssc.2017.8335151
Popis:	As a key optimisation method in high-level synthesis (HLS), high-performance loop pipelining is enabled by the static scheduling algorithm. When there are non-trivial memory dependencies in the loop, current HLS tools have to apply conservative pipeline schedule that also leads to nearly sequential execution. In this paper, we demonstrate using parametric polyhedral model to mathematically capture uncertain (i.e., parameterised by an undetermined variable) and/or non-uniform (i.e., varying between loop iterations) memory dependence patterns. According to this static analysis, if we always execute the loop with an aggressive (fast) pipeline schedule, we can generate the parameter conditions in which this execution is safe and the parametric break points when the execution encounters memory conflicts. Then, we apply these information into an automated source-to-source code transformation, which implements parametric loop pipelining and loop splitting. The transformed loop is synthesised by Vivado HLS and its execution speed can be adjusted at runtime to avoid memory conflicts. The experiments over a set of benchmark loops show that our optimisation can improve the runtime pipeline performance significantly with a reasonable overhead of hardware resources.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::3020dec80de0ff5deff1beed4638b09c https://doi.org/10.1109/acssc.2017.8335151 Zobrazit plný text záznamu