Detecting SDCs in GPGPUs Through an Efficient Instruction Duplication Mechanism

Autor: Nan Jiang, Xiaonan Wang, Hengshan Yue, Xiaohui Wei
Rok vydání: 2021
Předmět:
Zdroj: Knowledge Science, Engineering and Management ISBN: 9783030821524
KSEM
DOI: 10.1007/978-3-030-82153-1_47
Popis: As General-Purpose Graphics Processing Units (GPGPUs) are widely used in High-Performance Computing (HPC) applications, the vulnerability of GPGPUs to soft errors becomes a critical concern. In this paper, we propose an efficient instruction duplication mechanism that merely duplicates SDC vulnerable instructions for reliability overhead saving. We first observe that the SDC proneness of individual instruction is related to its instruction type, fault propagation, and whether it affects shared memory. Then, leveraging these observed factors, we utilize machine learning to intelligently identify all the SDC vulnerable instructions of GPU applications and efficiently protect them. Experimental results show that our method achieves a 90.45% SDC coverage only duplicating 37.8% of static instructions, which achieves a significant improvement in terms of performance and SDC detection capability compared to the state-of-the-art duplication technique in GPUs.
Databáze: OpenAIRE