Zobrazeno 1 - 10
of 13
pro vyhledávání: '"Charlene Yang"'
Publikováno v:
Lecture Notes in Networks and Systems ISBN: 9783030801250
This paper presents a practical methodology for collecting performance data necessary to conduct hierarchical Roofline analysis on NVIDIA GPUs. It discusses the extension of the Empirical Roofline Toolkit for broader support of a range of data precis
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_________::2c18e6f5a801de1773c78e22a071dab8
https://doi.org/10.1007/978-3-030-80126-7_35
https://doi.org/10.1007/978-3-030-80126-7_35
Autor:
Melisa Alkan, Keren Zhou, Christopher S. Daley, William Huhn, Swaroop Pophale, Michael Kruse, Ed D'Azevedo, Charlene Yang, Meifeng Lin, Mauro Del Ben, Dhruva Kulkarni, Paul Lin, Mark S. Gordon, Barbara Chapman, Peng Xu, Vivek S. Kale, Johannes Doerfert, Pui-Kuen Yeung, Oscar Hernandez, Tosaporn Sattasathuchana, Colleen Bertoni, Kiran Ravikumar, Dossay Oryspayev, Yun He, Buu Pham
Publikováno v:
OpenMP: Enabling Massive Node-Level Parallelism ISBN: 9783030852610
IWOMP
DOE / OSTI
IWOMP
DOE / OSTI
This paper reports on experiences gained and practices adopted when using the latest features of OpenMP to port a variety of HPC applications and mini-apps based on different computational motifs (BerkeleyGW, WDMApp/XGC, GAMESS, GESTS, and GridMini)
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::368934e119a5794f2b8e621a2934667d
https://doi.org/10.1007/978-3-030-85262-7_5
https://doi.org/10.1007/978-3-030-85262-7_5
Autor:
Jack Deslippe, Steven G. Louie, Zhenglu Li, Charlene Yang, Mauro Del Ben, Felipe H. da Jornada
Publikováno v:
SC
Large-scale GW calculations are the state-of-the-art approach to accurately describe many-body excited-state phenomena in complex materials. This is critical for novel device design but due to their extremely high computational cost, these calculatio
Publikováno v:
DLS@SC
Deep learning applications are usually very compute-intensive and require a long run time for training and inference. This has been tackled by researchers from both hardware and software sides, and in this paper, we propose a Roofline-based approach
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::92c9be298bee233db9304c0f8a0e60af
http://arxiv.org/abs/2009.04598
http://arxiv.org/abs/2009.04598
Autor:
Hugo Brunie, Jack Deslippe, Rahulkumar Gayatri, Yunsong Wang, Samuel Williams, Leonid Oliker, Charlene Yang, Muaaz Gul Awan, Jonathan Madsen
Publikováno v:
Lecture Notes in Computer Science ISBN: 9783030507428
HPC has undergone a significant transition toward heterogeneous architectures. This transition has introduced several issues in code migration to support multiple frameworks for targeting the various architectures. In order to cope with these challen
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_________::2c866fb29ecb98ab87c7e63a5f95ff39
https://doi.org/10.1007/978-3-030-50743-5_22
https://doi.org/10.1007/978-3-030-50743-5_22
Publikováno v:
Concurrency and Computation: Practice and Experience. 32
Publikováno v:
Accelerator Programming Using Directives ISBN: 9783030122737
WACCPD@SC
WACCPD@SC
In recent years, the HPC landscape has shifted away from traditional multi-core CPU systems to energy-efficient architectures, such as many-core CPUs and accelerators like GPUs, to achieve high performance. The goal of performance portability is to e
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_________::6d23037dc46bcd3434dd761c1739ed1d
https://doi.org/10.1007/978-3-030-12274-4_4
https://doi.org/10.1007/978-3-030-12274-4_4
Autor:
Brandon Cook, Thorsten Kurth, Samuel Williams, Brian Friesen, Zahra Ronaghi, Leonid Oliker, Douglas W. Doerfler, Adedoyin Adetokunbo, Jack Deslippe, Rahulkumar Gayatri, Protonu Basu, Charlene Yang
Publikováno v:
2018 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC).
System and node architectures continue to diversify to better balance on-node computation, memory capacity, memory bandwidth, interconnect bandwidth, power, and cost for specific computational workloads. For many application developers, achieving per
Autor:
Brian Austin, Thorsten Kurth, Nicholas J. Wright, Jack Deslippe, Chris Daley, Brandon Cook, Douglas W. Doerfler, Brian Friesen, Charlene Yang
Publikováno v:
PMBS@SC
When acquiring a supercomputer it is desirable to specify its performance using a single number. For many procurements, this is usually stated as a performance increase over a current generation platform, for example machine A provides 10 times great
Publikováno v:
Lecture Notes in Computer Science ISBN: 9783030024642
ISC Workshops
ISC Workshops
The CSB_Coo sparse matrix format is especially useful in situations such as eigenvalue problems where efficient SPMV and transposed SPMV_T operations are required. One strategy to increase the arithmetic intensity of large scale parallel solvers is t
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_________::3625d583f6d4241b5ac5d8a280b31fc2
https://doi.org/10.1007/978-3-030-02465-9_33
https://doi.org/10.1007/978-3-030-02465-9_33