Autor: |
De Sensi, Daniele, Pichetti, Lorenzo, Vella, Flavio, De Matteis, Tiziano, Ren, Zebin, Fusco, Luigi, Turisini, Matteo, Cesarini, Daniele, Lust, Kurt, Trivedi, Animesh, Roweth, Duncan, Spiga, Filippo, Di Girolamo, Salvatore, Hoefler, Torsten |
Rok vydání: |
2024 |
Předmět: |
|
Zdroj: |
Published in Proceedings of The International Conference for High Performance Computing Networking, Storage, and Analysis (SC '24) (2024) |
Druh dokumentu: |
Working Paper |
Popis: |
Multi-GPU nodes are increasingly common in the rapidly evolving landscape of exascale supercomputers. On these systems, GPUs on the same node are connected through dedicated networks, with bandwidths up to a few terabits per second. However, gauging performance expectations and maximizing system efficiency is challenging due to different technologies, design options, and software layers. This paper comprehensively characterizes three supercomputers - Alps, Leonardo, and LUMI - each with a unique architecture and design. We focus on performance evaluation of intra-node and inter-node interconnects on up to 4096 GPUs, using a mix of intra-node and inter-node benchmarks. By analyzing its limitations and opportunities, we aim to offer practical guidance to researchers, system architects, and software developers dealing with multi-GPU supercomputing. Our results show that there is untapped bandwidth, and there are still many opportunities for optimization, ranging from network to software optimization. |
Databáze: |
arXiv |
Externí odkaz: |
|