Run fast when you can: Loop pipelining with uncertain and non-uniform memory dependencies
Autor: | John Wickerson, Junyi Liu, George A. Constantinides, Samuel Bayliss |
---|---|
Přispěvatelé: | Engineering & Physical Science Research Council (E, Royal Academy Of Engineering, Imagination Technologies Ltd, Engineering & Physical Science Research Council (EPSRC) |
Rok vydání: | 2017 |
Předmět: |
Technology
Schedule Science & Technology Computer Science Information Systems Computer science Pipeline (computing) Engineering Electrical & Electronic 02 engineering and technology Parallel computing 020202 computer hardware & architecture Loop splitting Variable (computer science) Engineering Computer Science Telecommunications 0202 electrical engineering electronic engineering information engineering Polytope model Overhead (computing) |
Zdroj: | ACSSC 52nd Annual Asilomar Conference on Signals, Systems, and Computers |
DOI: | 10.1109/acssc.2017.8335151 |
Popis: | As a key optimisation method in high-level synthesis (HLS), high-performance loop pipelining is enabled by the static scheduling algorithm. When there are non-trivial memory dependencies in the loop, current HLS tools have to apply conservative pipeline schedule that also leads to nearly sequential execution. In this paper, we demonstrate using parametric polyhedral model to mathematically capture uncertain (i.e., parameterised by an undetermined variable) and/or non-uniform (i.e., varying between loop iterations) memory dependence patterns. According to this static analysis, if we always execute the loop with an aggressive (fast) pipeline schedule, we can generate the parameter conditions in which this execution is safe and the parametric break points when the execution encounters memory conflicts. Then, we apply these information into an automated source-to-source code transformation, which implements parametric loop pipelining and loop splitting. The transformed loop is synthesised by Vivado HLS and its execution speed can be adjusted at runtime to avoid memory conflicts. The experiments over a set of benchmark loops show that our optimisation can improve the runtime pipeline performance significantly with a reasonable overhead of hardware resources. |
Databáze: | OpenAIRE |
Externí odkaz: |