Inter-kernel Reuse-aware Thread Block Scheduling

Autor: Sarita V. Adve, Matthew D. Sinclair, Muhammad Huzaifa, Johnathan Alsop, Giordano Salvador, Abdulrahman Mahmoud
Rok vydání: 2020
Předmět:
Zdroj: ACM Transactions on Architecture and Code Optimization. 17:1-27
ISSN: 1544-3973
1544-3566
DOI: 10.1145/3406538
Popis: As GPUs have become more programmable, their performance and energy benefits have made them increasingly popular. However, while GPU compute units continue to improve in performance, on-chip memories lag behind and data accesses are becoming increasingly expensive in performance and energy. Emerging GPU coherence protocols can mitigate this bottleneck by exploiting data reuse in GPU caches across kernel boundaries. Unfortunately, current GPU thread block schedulers are typically not designed to expose such reuse. This article proposes new hardware thread block schedulers that optimize inter-kernel reuse while using work stealing to preserve load balance. Our schedulers are simple, decentralized, and have extremely low overhead. Compared to a baseline round-robin scheduler, the best performing scheduler reduces average execution time and energy by 19% and 11%, respectively, in regular applications, and 10% and 8%, respectively, in irregular applications.
Databáze: OpenAIRE