Výsledky vyhledávání - "Thakur Rajeev"

Report

Employing Artificial Intelligence to Steer Exascale Workflows with Colmena

Autor: Ward, Logan, Pauloski, J. Gregory, Hayot-Sasson, Valerie, Babuji, Yadu, Brace, Alexander, Chard, Ryan, Chard, Kyle, Thakur, Rajeev, Foster, Ian

Computational workflows are a common class of application on supercomputers, yet the loosely coupled and heterogeneous nature of workflows often fails to take full advantage of their capabilities. We created Colmena to leverage the massive parallelis

Externí odkaz: http://arxiv.org/abs/2408.14434

Zobrazit plný text záznamu

Report

MPI Progress For All

Autor: Zhou, Hui, Latham, Robert, Raffenetti, Ken, Guo, Yanfei, Thakur, Rajeev

The progression of communication in the Message Passing Interface (MPI) is not well defined, yet it is critical for application performance, particularly in achieving effective computation and communication overlap. The opaque nature of MPI progress

Externí odkaz: http://arxiv.org/abs/2405.13807

Zobrazit plný text záznamu

Report

Designing and Prototyping Extensions to MPI in MPICH

Autor: Zhou, Hui, Raffenetti, Ken, Guo, Yanfei, Gillis, Thomas, Latham, Robert, Thakur, Rajeev

As HPC system architectures and the applications running on them continue to evolve, the MPI standard itself must evolve. The trend in current and future HPC systems toward powerful nodes with multiple CPU cores and multiple GPU accelerators makes ef

Externí odkaz: http://arxiv.org/abs/2402.12274

Zobrazit plný text záznamu

Report

Frustrated with MPI+Threads? Try MPIxThreads!

Autor: Zhou, Hui, Raffenetti, Ken, Zhang, Junchao, Guo, Yanfei, Thakur, Rajeev

MPI+Threads, embodied by the MPI/OpenMP hybrid programming model, is a parallel programming paradigm where threads are used for on-node shared-memory parallelization and MPI is used for multi-node distributed-memory parallelization. OpenMP provides a

Externí odkaz: http://arxiv.org/abs/2401.16551

Zobrazit plný text záznamu

Report

A Comprehensive Performance Study of Large Language Models on Novel AI Accelerators

Autor: Emani, Murali, Foreman, Sam, Sastry, Varuni, Xie, Zhen, Raskar, Siddhisanket, Arnold, William, Thakur, Rajeev, Vishwanath, Venkatram, Papka, Michael E.

Artificial intelligence (AI) methods have become critical in scientific applications to help accelerate scientific discovery. Large language models (LLMs) are being considered as a promising approach to address some of the challenging problems becaus

Externí odkaz: http://arxiv.org/abs/2310.04607

Zobrazit plný text záznamu

Report

gZCCL: Compression-Accelerated Collective Communication Framework for GPU Clusters

Autor: Huang, Jiajun, Di, Sheng, Yu, Xiaodong, Zhai, Yujia, Liu, Jinyang, Huang, Yafan, Raffenetti, Ken, Zhou, Hui, Zhao, Kai, Lu, Xiaoyi, Chen, Zizhong, Cappello, Franck, Guo, Yanfei, Thakur, Rajeev

GPU-aware collective communication has become a major bottleneck for modern computing platforms as GPU computing power rapidly rises. A traditional approach is to directly integrate lossy compression into GPU-aware collectives, which can lead to seri

Externí odkaz: http://arxiv.org/abs/2308.05199

Zobrazit plný text záznamu

Report

Quantifying the Performance Benefits of Partitioned Communication in MPI

Autor: Gillis, Thomas, Raffenetti, Ken, Zhou, Hui, Guo, Yanfei, Thakur, Rajeev

Partitioned communication was introduced in MPI 4.0 as a user-friendly interface to support pipelined communication patterns, particularly common in the context of MPI+threads. It provides the user with the ability to divide a global buffer into smal

Externí odkaz: http://arxiv.org/abs/2308.03930

Zobrazit plný text záznamu

Report

Accelerating MPI Collectives with Process-in-Process-based Multi-object Techniques

Autor: Huang, Jiajun, Ouyang, Kaiming, Zhai, Yujia, Liu, Jinyang, Si, Min, Raffenetti, Ken, Zhou, Hui, Hori, Atsushi, Chen, Zizhong, Guo, Yanfei, Thakur, Rajeev

In the exascale computing era, optimizing MPI collective performance in high-performance computing (HPC) applications is critical. Current algorithms face performance degradation due to system call overhead, page faults, or data-copy latency, affecti

Externí odkaz: http://arxiv.org/abs/2305.10612

Zobrazit plný text záznamu

Report

An Optimized Error-controlled MPI Collective Framework Integrated with Lossy Compression

Autor: Huang, Jiajun, Di, Sheng, Yu, Xiaodong, Zhai, Yujia, Zhang, Zhaorui, Liu, Jinyang, Lu, Xiaoyi, Raffenetti, Ken, Zhou, Hui, Zhao, Kai, Chen, Zizhong, Cappello, Franck, Guo, Yanfei, Thakur, Rajeev

With the ever-increasing computing power of supercomputers and the growing scale of scientific applications, the efficiency of MPI collective communications turns out to be a critical bottleneck in large-scale distributed and parallel processing. The

Externí odkaz: http://arxiv.org/abs/2304.03890

Zobrazit plný text záznamu

Report

Cloud Services Enable Efficient AI-Guided Simulation Workflows across Heterogeneous Resources

Autor: Ward, Logan, Pauloski, J. Gregory, Hayot-Sasson, Valerie, Chard, Ryan, Babuji, Yadu, Sivaraman, Ganesh, Choudhury, Sutanay, Chard, Kyle, Thakur, Rajeev, Foster, Ian

Applications that fuse machine learning and simulation can benefit from the use of multiple computing resources, with, for example, simulation codes running on highly parallel supercomputers and AI training and inference tasks on specialized accelera

Externí odkaz: http://arxiv.org/abs/2303.08803

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání