Assessment of NVSHMEM for High Performance Computing
Autor: | Chung-Hsing Hsu, Neena Imam |
---|---|
Rok vydání: | 2021 |
Předmět: |
Computer science
Distributed computing Concurrency Programming complexity General Engineering 02 engineering and technology Solver Supercomputer CUDA 020204 information systems Scalability 0202 electrical engineering electronic engineering information engineering Programming paradigm Partitioned global address space |
Zdroj: | International Journal of Networking and Computing. 11:78-101 |
ISSN: | 2185-2847 2185-2839 |
DOI: | 10.15803/ijnc.11.1_78 |
Popis: | High Performance Computing has been a driving force behind important tasks such as scientific discovery and deep learning. It tends to achieve performance through greater concurrency and heterogeneity, where the underlying complexity of richer topologies is managed through software abstraction. In this paper, we present our assessment of NVSHMEM, an experimental programming library that supports the Partitioned Global Address Space programming model for NVIDIA GPU clusters. NVSHMEM offers several concrete advantages. One is that it reduces overheads and software complexity by allowing communication and computation to be interleaved vs. separating them into different phases. Another is that it implements the OpenSHMEM specification to provide efficient fine-grained one-sided communication, streamlining away overheads due to tag matching, wildcards, and unexpected messages which have compounding effect with increasing concurrency. It also offers ease of use by abstracting away low-level configuration operations that are required to enable low-overhead communication and direct loads and stores across processes. We evaluated NVSHMEM in terms of usability, functionality, and scalability by running two math kernels, matrix multiplication and Jacobi solver, and one full application, Horovod, on the 27,648-GPU Summit supercomputer. Our exercise of NVSHMEM at scale contributed to making NVSHMEM more robust and preparing it for production release. |
Databáze: | OpenAIRE |
Externí odkaz: |