Optimizing Barrier Algorithms on Asymmetric Subsystems of NUMA Machines
Autor: | Elizaveta Tokmasheva, Mikhail G. Kurnosov |
---|---|
Rok vydání: | 2021 |
Předmět: |
Tree (data structure)
Multi-core processor Hardware_MEMORYSTRUCTURES Computer science Node (networking) Rank (computer programming) Synchronization (computer science) Process (computing) Software_PROGRAMMINGTECHNIQUES Hardware_ARITHMETICANDLOGICSTRUCTURES Latency (engineering) Cluster analysis Algorithm |
Zdroj: | 2021 Ural Symposium on Biomedical Engineering, Radioelectronics and Information Technology (USBEREIT). |
Popis: | In this paper algorithms to perform barrier synchronization in MPI applications on HPC clusters of NUMA machines are investigated. We consider a case when all MPI processes, need to be synchronized, reside on a same multi socket NUMA machine. In particular, such a problem arises in hierarchical (topology-aware) barriers. Barrier algorithms for SMP/NUMA systems use shared counters and flags in a memory to communicate with each other. To minimize a latency of barrier algorithms it is important to place shared counters and flags in a memory of NUMA node which has minimal summary distance to other used NUMA nodes. We proposed the MinNumaDist algorithm for choosing the root process which is used to allocate shared flags and counters in a memory of its NUMA node. The algorithm selects the root rank with minimal summary distance from its NUMA node to NUMA nodes of all remaining processes. It reduces barrier synchronization time on asymmetric subsystems of processor cores (NUMA nodes and processor sockets have different number of assigned processes). Our experiments on dual socket NUMA machines show that the MinNumaDist decreases the latency of centralized barrier algorithms (central counter, flat tree, flat tree gather/release) on 10-170% in average. |
Databáze: | OpenAIRE |
Externí odkaz: |