Improving GPGPU Performance via Cache Locality Aware Thread Block Scheduling

Autor: Po-Han Wang, Hsiang-Yun Cheng, Li-jhan Chen, Chia-Lin Yang
Rok vydání: 2017
Předmět:
Zdroj: IEEE Computer Architecture Letters. 16:127-131
ISSN: 1556-6056
DOI: 10.1109/lca.2017.2693371
Popis: Modern GPGPUs support the concurrent execution of thousands of threads to provide an energy-efficient platform. However, the massive multi-threading of GPGPUs incurs serious cache contention, as the cache lines brought by one thread can easily be evicted by other threads in the small shared cache. In this paper, we propose a software-hardware cooperative approach that exploits the spatial locality among different thread blocks to better utilize the precious cache capacity. Through dynamic locality estimation and thread block scheduling, we can capture more performance improvement opportunities than prior work that only explores the spatial locality between consecutive thread blocks. Evaluations across diverse GPGPU applications show that, on average, our locality-aware scheduler provides 25 and 9 percent performance improvement over the commonly-employed round-robin scheduler and the state-of-the-art scheduler, respectively.
Databáze: OpenAIRE