Griffin: Hardware-Software Support for Efficient Page Migration in Multi-GPU Systems
Autor: | José L. Abellán, Yash Ukidave, Saiful A. Mojumder, Norman Rubin, Trinayan Baruah, David Kaeli, Ajay Joshi, Yifan Sun, Ali Tolga Dincer, John Kim |
---|---|
Rok vydání: | 2020 |
Předmět: |
010302 applied physics
Speedup Computer science Locality 02 engineering and technology Parallel computing Load balancing (computing) 01 natural sciences 020202 computer hardware & architecture Data sharing Demand paging 0103 physical sciences Scalability 0202 electrical engineering electronic engineering information engineering Central processing unit Programmer ComputingMethodologies_COMPUTERGRAPHICS |
Zdroj: | HPCA |
DOI: | 10.1109/hpca47549.2020.00055 |
Popis: | As transistor scaling becomes increasingly more difficult to achieve, scaling the core count on a single GPU chip has also become extremely challenging. As the volume of data to process in today's increasingly parallel workloads continues to grow unbounded, we need to find scalable solutions that can keep up with this increasing demand. To meet the need of modern-day parallel applications, multi-GPU systems offer a promising path to deliver high performance and large memory capacity. However, multi-GPU systems suffer from performance issues associated with GPU-to-GPU communication and data sharing, which severely impact the benefits of multi-GPU systems. Programming multi-GPU systems has been made considerably simpler with the advent of Unified Memory which enables runtime migration of pages to the GPU on demand. Current multi-GPU systems rely on a first-touch Demand Paging scheme, where memory pages are migrated from the CPU to the GPU on the first GPU access to a page. The data sharing nature of GPU applications makes deploying an efficient programmer-transparent mechanism for inter-GPU page migration challenging. Therefore following the initial CPU-to-GPU page migration, the page is pinned on that GPU. Future accesses to this page from other GPUs happen at a cache-line granularity – pages are not transferred between GPUs without significant programmer intervention. We observe that this mechanism suffers from two major drawbacks: 1) imbalance in the page distribution across multiple GPUs, and 2) inability to move the page to the GPU that uses it most frequently. Both of these problems lead to load imbalance across GPUs, degrading the performance of the multi-GPU system. To address these problems, we propose Griffin, a holistic hardware-software solution to improve the performance of NUMA multi-GPU systems. Griffin introduces programmer-transparent modifications to both the IOMMU and GPU architecture, supporting efficient runtime page migration based on locality information. In particular, Griffin employs a novel mechanism to detect and move pages at runtime between GPUs, increasing the frequency of resolving accesses locally, which in turn improves the performance. To ensure better load balancing across GPUs, Griffin employs a Delayed First-Touch Migration policy that ensures pages are evenly distributed across multiple GPUs. Our results on a diverse set of multi-GPU workloads show that Griffin can achieve up to a 2.9× speedup on a multi-GPU system, while incurring low implementation overhead. |
Databáze: | OpenAIRE |
Externí odkaz: |