Minimal Unroll Factor for Code Generation of Software Pipelining

Autor:	Sid Touati, Albert Cohen, David Gregg, Frederic Brault, Mounira Bachir
Přispěvatelé:	Architectures, Languages and Compilers to Harness the End of Moore Years (ALCHEMY), Laboratoire de Recherche en Informatique (LRI), Université Paris-Sud - Paris 11 (UP11)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Models and methods of analysis and optimization for systems with real-time and embedding constraints (AOSTE), Inria Sophia Antipolis - Méditerranée (CRISAM), Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Inria Paris-Rocquencourt, Institut National de Recherche en Informatique et en Automatique (Inria)-COMmunications, Réseaux, systèmes Embarqués et Distribués (Laboratoire I3S - COMRED), Laboratoire d'Informatique, Signaux, et Systèmes de Sophia Antipolis (I3S), Université Nice Sophia Antipolis (... - 2019) (UNS), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)-Université Nice Sophia Antipolis (... - 2019) (UNS), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)-Laboratoire d'Informatique, Signaux, et Systèmes de Sophia Antipolis (I3S), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA), Trinity College Dublin, Parallélisme de Kahn Synchrone (Parkas ), Département d'informatique - ENS Paris (DI-ENS), Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Inria Paris-Rocquencourt, Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), DIGITEO (contrat numéro 2007-10D)., Département d'informatique de l'École normale supérieure (DI-ENS), École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Paris-Rocquencourt, Université Nice Sophia Antipolis (1965 - 2019) (UNS), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)-Université Nice Sophia Antipolis (1965 - 2019) (UNS), École normale supérieure - Paris (ENS-PSL), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS-PSL), ANR MOPUCE project, DIGITEO foundation
Jazyk:	angličtina
Rok vydání:	2012
Předmět:	[INFO.INFO-AR]Computer Science [cs]/Hardware Architecture [cs.AR] Computer science Embedded systems Instruction level parallelism ACM: D.: Software/D.3: PROGRAMMING LANGUAGES/D.3.4: Processors 02 engineering and technology Parallel computing code generation Periodic register allocation computer.software_genre periodic register allocation ACM: D.: Software/D.3: PROGRAMMING LANGUAGES/D.3.4: Processors/D.3.4.6: Optimization Theoretical Computer Science Software pipelining ACM: D.: Software/D.3: PROGRAMMING LANGUAGES/D.3.4: Processors/D.3.4.0: Code generation 0202 electrical engineering electronic engineering information engineering Code generation Hardware_REGISTER-TRANSFER-LEVELIMPLEMENTATION software pipelining Loop unrolling Processor register Register renaming 020207 software engineering instruction level parallelism Compilation 020202 computer hardware & architecture [INFO.INFO-PF]Computer Science [cs]/Performance [cs.PF] Very long instruction word ACM: D.: Software/D.3: PROGRAMMING LANGUAGES/D.3.4: Processors/D.3.4.1: Compilers compilation embedded systems Compiler computer Software Information Systems Register allocation
Zdroj:	International Journal of Parallel Programming International Journal of Parallel Programming, Springer Verlag, 2012, ⟨10.1007/s10766-012-0203-z⟩ International Journal of Parallel Programming, 2012, ⟨10.1007/s10766-012-0203-z⟩
ISSN:	0885-7458 1573-7640
DOI:	10.1007/s10766-012-0203-z⟩
Popis:	International audience; We address the problem of generating compact code from software pipelined loops. Although software pipelining is a powerful technique to extract fine- grain parallelism, it generates lifetime intervals spanning multiple loop iterations. These intervals require periodic register allocation (also called variable expansion), which in turn yields a code generation challenge. We are looking for the minimal unrolling factor enabling the periodic register allocation of software pipelined kernels. This challenge is generally addressed through one of: (1) hardware support in the form of rotating register files, which solve the unrolling problem but are expensive in hard- ware; (2) register renaming by inserting register moves, which increase the number of operations in the loop, and may damage the schedule of the software pipeline and reduce throughput; (3) post-pass loop unrolling that does not compromise throughput but often leads to impractical code growth. The latter approach relies on the proof that MAXLIVE registers (maximal number of values simultaneously alive) are sufficient for periodic register allocation (Eisenbeis et al. in PACT '95: Proceedings of the IFIP WG10.3 working conference on Parallel Architectures and Compilation Techniques, pages 264-267, Manchester, UK, 1995; Hendren et al. in CC '92: Proceedings of the 4th International Conference on Compiler Construction, pages 176-191, London, UK, 1992). However, the best existing heuristic for controlling this code growth--modulo variable expansion (Lam in SIGPLAN Not 23(7):318-328, 1988)--may not apply the correct amount of loop unrolling to guarantee that MAXLIVE registers are enough, which may result in register spills Eisenbeis et al. in PACT '95: Proceedings of the IFIP WG10.3 working conference on Parallel Architectures and Compilation Techniques, pages 264-267, Manchester, UK, 1995. This paper presents our research results on the open problem of minimal loop unrolling, allowing a software-only code generation M. Bachir * S.-A.-A. Touati (B) * F. Brault * D. Gregg * A. Cohen University of Nice Sophia-Antipolis, Nice, France e-mail: Sid.Touati@inria.fr 123 that does not trade the optimality of the initiation interval ( I I ) for the compactness of the generated code. Our novel idea is to use the remaining free registers after periodic register allocation to relax the constraints on register reuse. The problem of minimal loop unrolling arises either before or after software pipelining, either with a single or with multiple register types (classes). We provide a formal problem definition for each scenario, and we propose and study a dedicated algorithm for each problem. Our solu- tions are implemented within an industrial-strength compiler for a VLIW embedded processor from STMicroelectronics, and validated on multiple benchmarks suites.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::606ed84b492694b7b86e4a5b5ee87caf https://hal.inria.fr/hal-00764521 Zobrazit plný text záznamu Full text from SpringerLink