Improving GPGPU Performance via Cache Locality Aware Thread Block Scheduling

Autor:	Po-Han Wang, Hsiang-Yun Cheng, Li-jhan Chen, Chia-Lin Yang
Rok vydání:	2017
Předmět:	010302 applied physics Computer science Locality 02 engineering and technology Parallel computing Thread (computing) 01 natural sciences Win32 Thread Information Block 020202 computer hardware & architecture Instruction set Shared memory Hardware and Architecture 0103 physical sciences 0202 electrical engineering electronic engineering information engineering Locality of reference Cache Cache algorithms
Zdroj:	IEEE Computer Architecture Letters. 16:127-131
ISSN:	1556-6056
DOI:	10.1109/lca.2017.2693371
Popis:	Modern GPGPUs support the concurrent execution of thousands of threads to provide an energy-efficient platform. However, the massive multi-threading of GPGPUs incurs serious cache contention, as the cache lines brought by one thread can easily be evicted by other threads in the small shared cache. In this paper, we propose a software-hardware cooperative approach that exploits the spatial locality among different thread blocks to better utilize the precious cache capacity. Through dynamic locality estimation and thread block scheduling, we can capture more performance improvement opportunities than prior work that only explores the spatial locality between consecutive thread blocks. Evaluations across diverse GPGPU applications show that, on average, our locality-aware scheduler provides 25 and 9 percent performance improvement over the commonly-employed round-robin scheduler and the state-of-the-art scheduler, respectively.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::473a8a20947de24d2a823f0db3cb4d35 https://doi.org/10.1109/lca.2017.2693371 Zobrazit plný text záznamu