Inter-kernel Reuse-aware Thread Block Scheduling

Autor:	Sarita V. Adve, Matthew D. Sinclair, Muhammad Huzaifa, Johnathan Alsop, Giordano Salvador, Abdulrahman Mahmoud
Rok vydání:	2020
Předmět:	Hardware thread Computer science business.industry Thread (computing) Reuse Bottleneck Scheduling (computing) Block scheduling Kernel (image processing) Hardware and Architecture Work stealing Embedded system business Software Information Systems
Zdroj:	ACM Transactions on Architecture and Code Optimization. 17:1-27
ISSN:	1544-3973 1544-3566
DOI:	10.1145/3406538
Popis:	As GPUs have become more programmable, their performance and energy benefits have made them increasingly popular. However, while GPU compute units continue to improve in performance, on-chip memories lag behind and data accesses are becoming increasingly expensive in performance and energy. Emerging GPU coherence protocols can mitigate this bottleneck by exploiting data reuse in GPU caches across kernel boundaries. Unfortunately, current GPU thread block schedulers are typically not designed to expose such reuse. This article proposes new hardware thread block schedulers that optimize inter-kernel reuse while using work stealing to preserve load balance. Our schedulers are simple, decentralized, and have extremely low overhead. Compared to a baseline round-robin scheduler, the best performing scheduler reduces average execution time and energy by 19% and 11%, respectively, in regular applications, and 10% and 8%, respectively, in irregular applications.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::806a03c1f720d78b5e4e138b34a82b21 https://doi.org/10.1145/3406538 Zobrazit plný text záznamu