Zobrazeno 1 - 10
of 28
pro vyhledávání: '"Saurabh Hukerikar"'
Autor:
Michael B. Sullivan, Nirmal R. Saxena, Mike O'Connor, Donghyuk Lee, Paul Racunas, Saurabh Hukerikar, Timothy Tsai, Siva Kumar Sastry Hari, Stephen W. Keckler
Publikováno v:
IEEE Micro. 42:69-77
Autor:
Saurabh Hukerikar, Nirmal Saxena
Publikováno v:
2022 IEEE International Test Conference (ITC).
Autor:
Michael J. Sullivan, Nirmal Saxena, Stephen W. Keckler, Mike O'Connor, Siva Kumar Sastry Hari, Saurabh Hukerikar, Paul Racunas, Timothy Tsai, Donghyuk Lee
Publikováno v:
MICRO
GPUs are used in high-reliability systems, including high-performance computers and autonomous vehicles. Because GPUs employ a high-bandwidth, wide-interface to DRAM and fetch each memory access from a single DRAM device, implementing full-device cor
Autor:
Christian Engelmann, Saurabh Hukerikar
Publikováno v:
PRDC
For high-performance computing (HPC) system designers and users, meeting the myriad challenges of next-generation exascale supercomputing systems requires rethinking their approach to application and system software design. Among these challenges, pr
Publikováno v:
International Journal of Parallel Programming. 46:225-251
In the presence of accelerated fault rates, which are projected to be the norm on future exascale systems, it will become increasingly difficult for high-performance computing (HPC) applications to accomplish useful computation. Due to the fault-obli
Autor:
Paul Racunas, Saurabh Hukerikar, Yanxiang Huang, Richard Bramley, Atieh Lotfi, Keshav Balasubramanian, Nirmal Saxena
Publikováno v:
ITC
Safety is the most important aspect of an autonomous driving platform. Deep neural networks (DNNs) play an increasingly critical role in localization, perception, and control in these systems. The object detection and classification inference are of
Publikováno v:
PDP
Efficient utilization of today's high-performance computing (HPC) systems with complex hardware and software components requires that the HPC applications are designed to tolerate process failures at runtime. With low mean time to failure (MTTF) of c
Publikováno v:
ICPE
Resiliency is the ability of large-scale high-performance computing (HPC) applications to gracefully handle errors, and recover from failures. In this paper, we propose a pattern-based approach to constructing resilience solutions that handle multipl
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::ab20a5b47402e9de830acf49f7ce61ea
http://arxiv.org/abs/1802.08233
http://arxiv.org/abs/1802.08233
Autor:
Christian Engelmann, Saurabh Hukerikar
Publikováno v:
Euro-Par 2017: Parallel Processing Workshops ISBN: 9783319751771
Euro-Par Workshops
Euro-Par Workshops
With the growing scale and complexity of high-performance computing (HPC) systems, resilience solutions that ensure continuity of service despite frequent errors and component failures must be methodically designed to balance the reliability requirem
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_________::1f0c34097dda61a5bc95d2f862b95ace
https://doi.org/10.1007/978-3-319-75178-8_45
https://doi.org/10.1007/978-3-319-75178-8_45