GPU-Aware Design, Implementation, and Evaluation of Non-blocking Collective Benchmarks

Autor: Hari Subramoni, Jonathan Perkins, Dhabaleswar K. Panda, Khaled Hamidouche, Ammar Ahmad Awan, Akshay Venkatesh
Rok vydání: 2015
Předmět:
Zdroj: EuroMPI
Popis: As we move towards efficient exascale systems, heterogeneous accelerators like NVIDIA GPUs are becoming a significant compute component of modern HPC clusters. It has become important to utilize every single cycle of every compute device available in the system. From NICs to GPUs to Co-processors, heterogeneous compute resources are the way to move forward. Another important trend, especially with the introduction of non-blocking collective communication in the latest MPI standard, is overlapping communication with computation. It has become an important design goal for messaging libraries like MVAPICH2 and OpenMPI. In this paper, we present an important benchmark that allows the users of different MPI libraries to evaluate performance of GPU-Aware Non-Blocking Collectives. The main performance metrics are overlap and latency. We provide insights on designing a GPU-Aware benchmark and discuss the challenges associated with identifying and implementing performance parameters like overlap, latency, effect of MPI_Test() calls to progress communication, effect of independent GPU communication while the overlapped computation proceeds under the communication, and the effect of complexity, target, and scale of this overlapped computation. To illustrate the efficacy of the proposed benchmark, we provide a comparative performance evaluation of GPU-Aware Non-Blocking Collectives in MVAPICH2 and OpenMPI.
Databáze: OpenAIRE