Compiler transformation of nested loops for general purpose GPUs

Autor: Sunita Chandrasekaran, Xiaonan Tian, Barbara Chapman, Rengan Xu, Yonghong Yan, Deepak Eachempati
Rok vydání: 2015
Předmět:
Zdroj: Concurrency and Computation: Practice and Experience. 28:537-556
ISSN: 1532-0626
DOI: 10.1002/cpe.3648
Popis: Manycore accelerators have the potential to significantly improve performance of scientific applications when offloading computationally intensive program portions to accelerators. Directive-based high-level programming models, such as OpenACC and OpenMP, are used to create applications for accelerators through annotating regions of code meant for offloading. OpenACC is an emerging directive-based programming model for programming accelerators that typically enable inexperienced programmers to achieve portable and productive performance within applications. In this paper, we present our research in developing challenges and solutions when creating an open-source OpenACC compiler in an industrial framework OpenUH as a branch of Open64. We then discuss in detail techniques we developed for loop scheduling reduction operations on general purpose GPUs. The compiler is evaluated with benchmarks from the NAS Parallel Benchmarks suite and self-written micro-benchmarks for reduction operations. This implementation has been designed to serve as a compiler infrastructure for researchers to explore advanced compiler techniques, extend OpenACC to other programming models, and build performance tools used in conjunction with OpenACC programs. Copyright © 2015 John Wiley & Sons, Ltd.
Databáze: OpenAIRE