Automatic Parallelization to Asynchronous Task-Based Runtimes Through a Generic Runtime Layer
Autor: | Muthu Baskaran, Jonathan Springer, Charles Jin, Benoit Meister |
---|---|
Rok vydání: | 2019 |
Předmět: |
020203 distributed computing
Computer science Semantics (computer science) Optimizing compiler 010103 numerical & computational mathematics 02 engineering and technology Parallel computing computer.software_genre 01 natural sciences Automatic parallelization Task (computing) Runtime system Asynchronous communication Scalability 0202 electrical engineering electronic engineering information engineering Compiler 0101 mathematics computer |
Zdroj: | HPEC |
DOI: | 10.1109/hpec.2019.8916294 |
Popis: | With the end of Moore’s law, asynchronous task-based parallelism has seen growing support as a parallel programming paradigm, with the runtime system offering such advantages as dynamic load balancing, locality, and scalability. However, there has been a proliferation of such programming systems in recent years, each of which presents different performance tradeoffs and runtime semantics. Developing applications on top of these systems thus requires not only application expertise but also deep familiarity with the runtime, exacerbating the perennial problems of programmability and portability.This work makes three main contributions to this growing landscape. First, we extend a polyhedral optimizing compiler with techniques to extract task-based parallelism and data management for a broad class of asynchronous task-based runtimes. Second, we introduce a generic runtime layer for asynchronous task-based systems with representations of data and tasks that are sparse and tiled by default, which serves as an abstract target for the compiler backend. Finally, we implement this generic layer using OpenMP and Legion, demonstrating the flexibility and viability of the generic layer and delivering an end-to-end path for automatic parallelization to asynchronous task-based runtimes. Using a wide range of applications from deep learning to scientific kernels, we obtain geometric mean speedups of 23.0* (OpenMP) and 9.5* (Legion) using 64 threads. |
Databáze: | OpenAIRE |
Externí odkaz: |