Improving GPGPU Performance via Cache Locality Aware Thread Block Scheduling
Autor: | Po-Han Wang, Hsiang-Yun Cheng, Li-jhan Chen, Chia-Lin Yang |
---|---|
Rok vydání: | 2017 |
Předmět: |
010302 applied physics
Computer science Locality 02 engineering and technology Parallel computing Thread (computing) 01 natural sciences Win32 Thread Information Block 020202 computer hardware & architecture Instruction set Shared memory Hardware and Architecture 0103 physical sciences 0202 electrical engineering electronic engineering information engineering Locality of reference Cache Cache algorithms |
Zdroj: | IEEE Computer Architecture Letters. 16:127-131 |
ISSN: | 1556-6056 |
DOI: | 10.1109/lca.2017.2693371 |
Popis: | Modern GPGPUs support the concurrent execution of thousands of threads to provide an energy-efficient platform. However, the massive multi-threading of GPGPUs incurs serious cache contention, as the cache lines brought by one thread can easily be evicted by other threads in the small shared cache. In this paper, we propose a software-hardware cooperative approach that exploits the spatial locality among different thread blocks to better utilize the precious cache capacity. Through dynamic locality estimation and thread block scheduling, we can capture more performance improvement opportunities than prior work that only explores the spatial locality between consecutive thread blocks. Evaluations across diverse GPGPU applications show that, on average, our locality-aware scheduler provides 25 and 9 percent performance improvement over the commonly-employed round-robin scheduler and the state-of-the-art scheduler, respectively. |
Databáze: | OpenAIRE |
Externí odkaz: |