Zobrazeno 1 - 10
of 18
pro vyhledávání: '"Hari Subramoni"'
Autor:
Kaushik Kandadi Suresh, Kawthar Shafie Khorassani, Chen Chun Chen, Bharath Ramesh, Mustafa Abduljabbar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda
Publikováno v:
IEEE Micro. 43:131-139
Autor:
Kawthar Shafie Khorassani, Chen-Chun Chen, Bharath Ramesh, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda
Publikováno v:
Journal of Computer Science and Technology. 38:128-145
Publikováno v:
IEEE Micro. 42:53-60
Autor:
Sourav Chakraborty, Jahanzeb Maqbool Hashmi, Hari Subramoni, Mohammadreza Bayatpour, Ching-Hsiang Chu, Dhabaleswar K. Panda
Publikováno v:
Journal of Parallel and Distributed Computing. 144:1-13
This paper addresses the challenges of MPI derived datatype processing and proposes FALCON-X — A Fast and Low-overhead Communication framework for optimized zero-copy intra-node derived datatype communication on emerging CPU/GPU architectures. We q
Publikováno v:
IEEE Micro. 40:35-43
Heterogeneous high-performance computing systems with GPUs are equipped with high-performance interconnects like InfiniBand, Omni-Path, PCIe, and NVLink. However, little exists in the literature that captures the performance impact of these interconn
Autor:
Sourav Chakraborty, Hari Subramoni, Mohammadreza Bayatpour, Pouya Kousha, Dhabaleswar K. Panda, Amit Ruhela
Publikováno v:
Parallel Computing. 85:13-26
The overlap of computation and communication is critical for good performance of many HPC applications. State-of-the-art designs for the asynchronous progress require specially designed hardware resources (advanced switches or network interface cards
Autor:
Ammar Ahmad Awan, Karthik Vadambacheri Manian, Dhabaleswar K. Panda, Hari Subramoni, Ching-Hsiang Chu
Publikováno v:
Parallel Computing. 85:141-152
Traditionally, MPI runtimes have been designed for clusters with a large number of nodes. However, with the advent of MPI+CUDA applications and GPU clusters with a relatively smaller number of nodes, efficient communication schemes need to be designe
Autor:
Amit Ruhela, Dhabaleswar K. Panda, Srinivasan Ramesh, Hari Subramoni, Allen D. Malony, Aurèle Mahéo, Sameer Shende
Publikováno v:
Parallel Computing. 77:19-37
The desire for high performance on scalable parallel systems is increasing the complexity and tunability of MPI implementations. The MPI Tools Information Interface (MPI_T) introduced as part of the MPI 3.0 standard provides an opportunity for perfor
Autor:
Hari Subramoni, Jahanzeb Maqbool Hashmi, Dhabaleswar K. Panda, Ching-Hsiang Chu, Chen-Chun Chen, Kawthar Shafie Khorassani
Publikováno v:
Lecture Notes in Computer Science ISBN: 9783030787127
ISC
ISC
Due to the emergence of AMD GPUs and their adoption in upcoming exascale systems (e.g. Frontier), it is pertinent to have scientific applications and communication middlewares ported and optimized for these systems. Radeon Open Compute (ROCm) platfor
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_________::7b121328f3fe8240f55047725cb698e2
https://doi.org/10.1007/978-3-030-78713-4_7
https://doi.org/10.1007/978-3-030-78713-4_7
Autor:
Hari Subramoni, Bharath Ramesh, Ching-Hsiang Chu, Arpan Jain, Dhabaleswar K. Panda, Nick Sarkauskas, Kaushik Kandadi Suresh, Pouya Kousha
Publikováno v:
HiPC
The recent advent of advanced fabrics like NVIDIA NVLink is enabling the deployment of dense Graphics Processing Unit (GPU) systems, e.g., DGX-2 and Summit. The Message Passing Interface (MPI) has been the dominant programming model to design distrib