Fast Fine-Grained Global Synchronization on GPUs
Autor: | Calvin Lin, Kai Wang, Donald S. Fussell |
---|---|
Rok vydání: | 2019 |
Předmět: |
010302 applied physics
Computer science Message passing 02 engineering and technology Parallel computing Thread (computing) 01 natural sciences Synchronization 020202 computer hardware & architecture 0103 physical sciences Scalability 0202 electrical engineering electronic engineering information engineering General-purpose computing on graphics processing units Architecture Software architecture Scratchpad memory |
Zdroj: | ASPLOS |
Popis: | This paper extends the reach of General Purpose GPU programming by presenting a software architecture that supports efficient fine-grained synchronization over global memory. The key idea is to transform global synchronization into global communication so that conflicts are serialized at the thread block level. With this structure, the threads within each thread block can synchronize using low latency, high-bandwidth local scratchpad memory. To enable this architecture, we implement a scalable and efficient message passing library. Using Nvidia GTX 1080 ti GPUs, we evaluate our new software architecture by using it to solve a set of five irregular problems on a variety of workloads. We find that on average, our solutions improve performance over carefully tuned state-of-the-art solutions by 3.6×. |
Databáze: | OpenAIRE |
Externí odkaz: |