Minimal Unroll Factor for Code Generation of Software Pipelining

Autor: Sid Touati, Albert Cohen, David Gregg, Frederic Brault, Mounira Bachir
Přispěvatelé: Architectures, Languages and Compilers to Harness the End of Moore Years (ALCHEMY), Laboratoire de Recherche en Informatique (LRI), Université Paris-Sud - Paris 11 (UP11)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Models and methods of analysis and optimization for systems with real-time and embedding constraints (AOSTE), Inria Sophia Antipolis - Méditerranée (CRISAM), Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Inria Paris-Rocquencourt, Institut National de Recherche en Informatique et en Automatique (Inria)-COMmunications, Réseaux, systèmes Embarqués et Distribués (Laboratoire I3S - COMRED), Laboratoire d'Informatique, Signaux, et Systèmes de Sophia Antipolis (I3S), Université Nice Sophia Antipolis (... - 2019) (UNS), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)-Université Nice Sophia Antipolis (... - 2019) (UNS), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)-Laboratoire d'Informatique, Signaux, et Systèmes de Sophia Antipolis (I3S), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA), Trinity College Dublin, Parallélisme de Kahn Synchrone (Parkas ), Département d'informatique - ENS Paris (DI-ENS), Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Inria Paris-Rocquencourt, Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), DIGITEO (contrat numéro 2007-10D)., Département d'informatique de l'École normale supérieure (DI-ENS), École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Paris-Rocquencourt, Université Nice Sophia Antipolis (1965 - 2019) (UNS), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)-Université Nice Sophia Antipolis (1965 - 2019) (UNS), École normale supérieure - Paris (ENS-PSL), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS-PSL), ANR MOPUCE project, DIGITEO foundation
Jazyk: angličtina
Rok vydání: 2012
Předmět:
[INFO.INFO-AR]Computer Science [cs]/Hardware Architecture [cs.AR]
Computer science
Embedded systems
Instruction level parallelism
ACM: D.: Software/D.3: PROGRAMMING LANGUAGES/D.3.4: Processors
02 engineering and technology
Parallel computing
code generation
Periodic register allocation
computer.software_genre
periodic register allocation
ACM: D.: Software/D.3: PROGRAMMING LANGUAGES/D.3.4: Processors/D.3.4.6: Optimization
Theoretical Computer Science
Software pipelining
ACM: D.: Software/D.3: PROGRAMMING LANGUAGES/D.3.4: Processors/D.3.4.0: Code generation
0202 electrical engineering
electronic engineering
information engineering

Code generation
Hardware_REGISTER-TRANSFER-LEVELIMPLEMENTATION
software pipelining
Loop unrolling
Processor register
Register renaming
020207 software engineering
instruction level parallelism
Compilation
020202 computer hardware & architecture
[INFO.INFO-PF]Computer Science [cs]/Performance [cs.PF]
Very long instruction word
ACM: D.: Software/D.3: PROGRAMMING LANGUAGES/D.3.4: Processors/D.3.4.1: Compilers
compilation
embedded systems
Compiler
computer
Software
Information Systems
Register allocation
Zdroj: International Journal of Parallel Programming
International Journal of Parallel Programming, Springer Verlag, 2012, ⟨10.1007/s10766-012-0203-z⟩
International Journal of Parallel Programming, 2012, ⟨10.1007/s10766-012-0203-z⟩
ISSN: 0885-7458
1573-7640
DOI: 10.1007/s10766-012-0203-z⟩
Popis: International audience; We address the problem of generating compact code from software pipelined loops. Although software pipelining is a powerful technique to extract fine- grain parallelism, it generates lifetime intervals spanning multiple loop iterations. These intervals require periodic register allocation (also called variable expansion), which in turn yields a code generation challenge. We are looking for the minimal unrolling factor enabling the periodic register allocation of software pipelined kernels. This challenge is generally addressed through one of: (1) hardware support in the form of rotating register files, which solve the unrolling problem but are expensive in hard- ware; (2) register renaming by inserting register moves, which increase the number of operations in the loop, and may damage the schedule of the software pipeline and reduce throughput; (3) post-pass loop unrolling that does not compromise throughput but often leads to impractical code growth. The latter approach relies on the proof that MAXLIVE registers (maximal number of values simultaneously alive) are sufficient for periodic register allocation (Eisenbeis et al. in PACT '95: Proceedings of the IFIP WG10.3 working conference on Parallel Architectures and Compilation Techniques, pages 264-267, Manchester, UK, 1995; Hendren et al. in CC '92: Proceedings of the 4th International Conference on Compiler Construction, pages 176-191, London, UK, 1992). However, the best existing heuristic for controlling this code growth--modulo variable expansion (Lam in SIGPLAN Not 23(7):318-328, 1988)--may not apply the correct amount of loop unrolling to guarantee that MAXLIVE registers are enough, which may result in register spills Eisenbeis et al. in PACT '95: Proceedings of the IFIP WG10.3 working conference on Parallel Architectures and Compilation Techniques, pages 264-267, Manchester, UK, 1995. This paper presents our research results on the open problem of minimal loop unrolling, allowing a software-only code generation M. Bachir * S.-A.-A. Touati (B) * F. Brault * D. Gregg * A. Cohen University of Nice Sophia-Antipolis, Nice, France e-mail: Sid.Touati@inria.fr 123  that does not trade the optimality of the initiation interval ( I I ) for the compactness of the generated code. Our novel idea is to use the remaining free registers after periodic register allocation to relax the constraints on register reuse. The problem of minimal loop unrolling arises either before or after software pipelining, either with a single or with multiple register types (classes). We provide a formal problem definition for each scenario, and we propose and study a dedicated algorithm for each problem. Our solu- tions are implemented within an industrial-strength compiler for a VLIW embedded processor from STMicroelectronics, and validated on multiple benchmarks suites.
Databáze: OpenAIRE