Compiler Optimization of Accelerator Data Transfers
Autor: | Quinn Snell, Alexander Lemon, David A. Penry, Matthew B. Ashcraft |
---|---|
Rok vydání: | 2017 |
Předmět: |
010302 applied physics
Computer science Byte Optimizing compiler 02 engineering and technology computer.software_genre 01 natural sciences 020202 computer hardware & architecture Theoretical Computer Science Scheduling (computing) CUDA Workflow 0103 physical sciences 0202 electrical engineering electronic engineering information engineering Operating system Compiler Programmer Field-programmable gate array computer Software Information Systems |
Zdroj: | International Journal of Parallel Programming. 47:39-58 |
ISSN: | 1573-7640 0885-7458 |
Popis: | Accelerators such as GPUs, FPGAs, and many-core processors can provide significant performance improvements, but their effectiveness is dependent upon the skill of programmers to manage their complex architectures. One area of difficulty is determining which data to transfer on and off of the accelerator and when. Poorly placed data transfers can result in overheads that completely dwarf the benefits of using accelerators. To know what data to transfer, and when, the programmer must understand the data-flow of the transferred memory locations throughout the program, and how the accelerator region fits into the program as a whole. We argue that compilers should take on the responsibility of data transfer scheduling, thereby reducing the demands on the programmer, and resulting in improved program performance and program efficiency from the reduction in the number of bytes transferred. We show that by performing whole-program transfer scheduling on accelerator data transfers we are able to automatically eliminate up to 99% of the bytes transferred to and from the accelerator compared to transfering all data immediately before and after kernel execution for all data involved. The analysis and optimization are language and accelerator-agnostic, but for our examples and testing they have been implemented into an OpenMP to LLVM-IR to CUDA workflow. |
Databáze: | OpenAIRE |
Externí odkaz: |