Variation-Aware Reliable Many-Core System Design by Exploiting Inherent Core Redundancy

Autor: Ching-Yao Chou, An-Yeu Wu, Huai-Ting Li, Yuan-Ting Hsieh, Wei-Ching Chu
Rok vydání: 2017
Předmět:
Zdroj: IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 25:2803-2816
ISSN: 1557-9999
1063-8210
DOI: 10.1109/tvlsi.2017.2715803
Popis: Reliability issues are more severe in multi/many-core systems because of the integration of more devices in advanced technology nodes. To achieve robust computing in nanoscale designs, many circuit-level and architecture-level redundancy techniques had been proposed, which pose large fixed silicon area overhead and a lack of flexibility. In recent years, some methods have exploited the “inherent core redundancy” of many-core systems to implicitly implement N-modular redundant (NMR) subsystems to achieve area-efficient fault-tolerant computing. However, while facing the different levels of soft error rate, task vulnerability, and task significance in the many-core system, existing core-level redundancy methods become ineffective. To achieve robust computation in many-core systems with intercore variations and mixed workloads, we propose a variation-aware core-level redundancy scheme. Two novel approaches are presented in this scheme: 1) we construct NMR tables that store the degree of redundancy using mathematical models for systems affected by these variations and 2) we dynamically allocate each replicated task to a proper core with variation-aware mapping algorithms to achieve high reliability. Based on a modified multicore simulator, Sniper-Transient Error Process Variation (TEVR), the experimental results show that the proposed scheme can increase the reliability by 47.92% and achieve the energy saving of 39% compared with conventional core-level redundancy methods.
Databáze: OpenAIRE