Exploring GPU-to-GPU Communication: Insights into Supercomputer Interconnects

Autor: De Sensi, Daniele, Pichetti, Lorenzo, Vella, Flavio, De Matteis, Tiziano, Ren, Zebin, Fusco, Luigi, Turisini, Matteo, Cesarini, Daniele, Lust, Kurt, Trivedi, Animesh, Roweth, Duncan, Spiga, Filippo, Di Girolamo, Salvatore, Hoefler, Torsten
Rok vydání: 2024
Předmět:
Zdroj: Published in Proceedings of The International Conference for High Performance Computing Networking, Storage, and Analysis (SC '24) (2024)
Druh dokumentu: Working Paper
Popis: Multi-GPU nodes are increasingly common in the rapidly evolving landscape of exascale supercomputers. On these systems, GPUs on the same node are connected through dedicated networks, with bandwidths up to a few terabits per second. However, gauging performance expectations and maximizing system efficiency is challenging due to different technologies, design options, and software layers. This paper comprehensively characterizes three supercomputers - Alps, Leonardo, and LUMI - each with a unique architecture and design. We focus on performance evaluation of intra-node and inter-node interconnects on up to 4096 GPUs, using a mix of intra-node and inter-node benchmarks. By analyzing its limitations and opportunities, we aim to offer practical guidance to researchers, system architects, and software developers dealing with multi-GPU supercomputing. Our results show that there is untapped bandwidth, and there are still many opportunities for optimization, ranging from network to software optimization.
Databáze: arXiv