On Optimizing Complex Stencils on GPUs

Autor: Aravind Sukumaran-Rajam, P. Sadayappan, Atanas Rountev, Louis-Noël Pouchet, Prashant Singh Rawat, Miheer Vaidya
Rok vydání: 2019
Předmět:
Zdroj: IPDPS
DOI: 10.1109/ipdps.2019.00073
Popis: Stencil computations are often the compute-intensive kernel in many scientific applications. With the increasing demand for computational accuracy, and the emergence of massively data-parallel high-bandwidth architectures like GPUs, stencils have steadily become more complex in terms of the stencil order, data accesses, and reuse patterns. Many prior efforts have focused on optimizing simpler stencil computations on various platforms. However, existing stencil code generators face challenges in optimizing such complex multi-statement stencil DAGs. This paper addresses the challenges in optimizing high-order stencil DAGs on GPUs by focusing on two key considerations: (1) enabling the domain expert to guide the code optimization, which may otherwise be extremely challenging for complex stencils; and (2) using bottleneck analysis via runtime profiling to guide the application of optimizations, and the tuning of various code generation parameters. We implement these abstractions in a prototype code generation framework termed Artemis, and evaluate its efficacy over multiple stencil kernels with varying complexity and operational intensity on an NVIDIA P100 GPU.
Databáze: OpenAIRE