Minnow
Autor: | Derek Chiou, Xiaoyu Ma, Michael Thomson, Dan Zhang |
---|---|
Rok vydání: | 2018 |
Předmět: |
Instruction prefetch
010302 applied physics Speedup Hardware_MEMORYSTRUCTURES Computer science CPU cache Cache miss 020207 software engineering Thread (computing) Parallel computing 02 engineering and technology 01 natural sciences Computer Graphics and Computer-Aided Design 020202 computer hardware & architecture Scheduling (computing) Scalability 0103 physical sciences 0202 electrical engineering electronic engineering information engineering Graph (abstract data type) Cache Software |
Zdroj: | ASPLOS |
ISSN: | 1558-1160 0362-1340 |
Popis: | The importance of irregular applications such as graph analytics is rapidly growing with the rise of Big Data. However, parallel graph workloads tend to perform poorly on general-purpose chip multiprocessors (CMPs) due to poor cache locality, low compute intensity, frequent synchronization, uneven task sizes, and dynamic task generation. At high thread counts, execution time is dominated by worklist synchronization overhead and cache misses. Researchers have proposed hardware worklist accelerators to address scheduling costs, but these proposals often harden a specific scheduling policy and do not address high cache miss rates. We address this with Minnow, a technique that augments each core in a CMP with a lightweight Minnow accelerator. Minnow engines offload worklist scheduling from worker threads to improve scalability. The engines also perform worklist-directed prefetching, a technique that exploits knowledge of upcoming tasks to issue nearly perfectly accurate and timely prefetch operations. On a simulated 64-core CMP running a parallel graph benchmark suite, Minnow improves scalability and reduces L2 cache misses from 29 to 1.2 MPKI on average, resulting in 6.01x average speedup over an optimized software baseline for only 1% area overhead. |
Databáze: | OpenAIRE |
Externí odkaz: |