Zobrazeno 1 - 10
of 636
pro vyhledávání: '"Guo, YanFei"'
The progression of communication in the Message Passing Interface (MPI) is not well defined, yet it is critical for application performance, particularly in achieving effective computation and communication overlap. The opaque nature of MPI progress
Externí odkaz:
http://arxiv.org/abs/2405.13807
As HPC system architectures and the applications running on them continue to evolve, the MPI standard itself must evolve. The trend in current and future HPC systems toward powerful nodes with multiple CPU cores and multiple GPU accelerators makes ef
Externí odkaz:
http://arxiv.org/abs/2402.12274
MPI+Threads, embodied by the MPI/OpenMP hybrid programming model, is a parallel programming paradigm where threads are used for on-node shared-memory parallelization and MPI is used for multi-node distributed-memory parallelization. OpenMP provides a
Externí odkaz:
http://arxiv.org/abs/2401.16551
The MPI Forum has recently adopted a Python scripting engine for generating the API text in the standard document. As a by-product, it made available reliable and rich descriptions of all MPI functions that are suited for scripting tools. Using these
Externí odkaz:
http://arxiv.org/abs/2401.16547
Autor:
Huang, Jiajun, Di, Sheng, Yu, Xiaodong, Zhai, Yujia, Liu, Jinyang, Huang, Yafan, Raffenetti, Ken, Zhou, Hui, Zhao, Kai, Lu, Xiaoyi, Chen, Zizhong, Cappello, Franck, Guo, Yanfei, Thakur, Rajeev
GPU-aware collective communication has become a major bottleneck for modern computing platforms as GPU computing power rapidly rises. A traditional approach is to directly integrate lossy compression into GPU-aware collectives, which can lead to seri
Externí odkaz:
http://arxiv.org/abs/2308.05199
Partitioned communication was introduced in MPI 4.0 as a user-friendly interface to support pipelined communication patterns, particularly common in the context of MPI+threads. It provides the user with the ability to divide a global buffer into smal
Externí odkaz:
http://arxiv.org/abs/2308.03930
Autor:
Huang, Jiajun, Ouyang, Kaiming, Zhai, Yujia, Liu, Jinyang, Si, Min, Raffenetti, Ken, Zhou, Hui, Hori, Atsushi, Chen, Zizhong, Guo, Yanfei, Thakur, Rajeev
In the exascale computing era, optimizing MPI collective performance in high-performance computing (HPC) applications is critical. Current algorithms face performance degradation due to system call overhead, page faults, or data-copy latency, affecti
Externí odkaz:
http://arxiv.org/abs/2305.10612
Autor:
Huang, Jiajun, Di, Sheng, Yu, Xiaodong, Zhai, Yujia, Zhang, Zhaorui, Liu, Jinyang, Lu, Xiaoyi, Raffenetti, Ken, Zhou, Hui, Zhao, Kai, Chen, Zizhong, Cappello, Franck, Guo, Yanfei, Thakur, Rajeev
With the ever-increasing computing power of supercomputers and the growing scale of scientific applications, the efficiency of MPI collective communications turns out to be a critical bottleneck in large-scale distributed and parallel processing. The
Externí odkaz:
http://arxiv.org/abs/2304.03890
The hybrid MPI+X programming paradigm, where X refers to threads or GPUs, has gained prominence in the high-performance computing arena. This corresponds to a trend of system architectures growing more heterogeneous. The current MPI standard only spe
Externí odkaz:
http://arxiv.org/abs/2208.13707
Autor:
Zhi, Zhenzhen, Guo, Yanfei, Qi, Huahui, Tan, Hongbo, Jin, Zihao, Wang, Yujiang, Su, Ying, Ma, Baoguo
Publikováno v:
In Materials Today Communications December 2024 41