On Optimizing Complex Stencils on GPUs
Autor: | Aravind Sukumaran-Rajam, P. Sadayappan, Atanas Rountev, Louis-Noël Pouchet, Prashant Singh Rawat, Miheer Vaidya |
---|---|
Rok vydání: | 2019 |
Předmět: |
010302 applied physics
Profiling (computer programming) Stencil code Computer science 020207 software engineering 02 engineering and technology Parallel computing Software_PROGRAMMINGTECHNIQUES Program optimization 01 natural sciences Stencil Bottleneck CUDA Kernel (image processing) 0103 physical sciences 0202 electrical engineering electronic engineering information engineering Code generation General-purpose computing on graphics processing units ComputingMethodologies_COMPUTERGRAPHICS |
Zdroj: | IPDPS |
DOI: | 10.1109/ipdps.2019.00073 |
Popis: | Stencil computations are often the compute-intensive kernel in many scientific applications. With the increasing demand for computational accuracy, and the emergence of massively data-parallel high-bandwidth architectures like GPUs, stencils have steadily become more complex in terms of the stencil order, data accesses, and reuse patterns. Many prior efforts have focused on optimizing simpler stencil computations on various platforms. However, existing stencil code generators face challenges in optimizing such complex multi-statement stencil DAGs. This paper addresses the challenges in optimizing high-order stencil DAGs on GPUs by focusing on two key considerations: (1) enabling the domain expert to guide the code optimization, which may otherwise be extremely challenging for complex stencils; and (2) using bottleneck analysis via runtime profiling to guide the application of optimizations, and the tuning of various code generation parameters. We implement these abstractions in a prototype code generation framework termed Artemis, and evaluate its efficacy over multiple stencil kernels with varying complexity and operational intensity on an NVIDIA P100 GPU. |
Databáze: | OpenAIRE |
Externí odkaz: |