Popis: |
In a post-Moore world, asynchronous task-based parallelism has become a popular paradigm for parallel programming. Auto-parallelizing compilers are also an active area of research, promising improved developer productivity and application performance. This work seeks to unify these efforts by delivering an end-to-end path for auto-parallelization through a generic runtime layer for asynchronous task-based systems. First, we extend R-Stream, an auto-parallelizing polyhedral compiler, to express task-based parallelism and data management for a broader class of task-based runtimes. We additionally introduce a generic runtime layer for asynchronous task-based parallelism, which provides an abstract target for the compiler backend. We implement this generic runtime layer using OpenMP for shared memory systems and Legion for distributed memory systems. Starting from sequential source, we obtain geometric mean speedups of 23.0x (OpenMP) and 9.5x (Legion) on a wide range of applications, from deep learning to scientific kernels. |