Variation-Aware Reliable Many-Core System Design by Exploiting Inherent Core Redundancy
Autor: | Ching-Yao Chou, An-Yeu Wu, Huai-Ting Li, Yuan-Ting Hsieh, Wei-Ching Chu |
---|---|
Rok vydání: | 2017 |
Předmět: |
010302 applied physics
Triple modular redundancy Multi-core processor business.industry Computer science Distributed computing 02 engineering and technology 01 natural sciences 020202 computer hardware & architecture Soft error Hardware and Architecture Robustness (computer science) Embedded system 0103 physical sciences 0202 electrical engineering electronic engineering information engineering Redundancy (engineering) Systems design Electrical and Electronic Engineering business Dual modular redundancy Software |
Zdroj: | IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 25:2803-2816 |
ISSN: | 1557-9999 1063-8210 |
DOI: | 10.1109/tvlsi.2017.2715803 |
Popis: | Reliability issues are more severe in multi/many-core systems because of the integration of more devices in advanced technology nodes. To achieve robust computing in nanoscale designs, many circuit-level and architecture-level redundancy techniques had been proposed, which pose large fixed silicon area overhead and a lack of flexibility. In recent years, some methods have exploited the “inherent core redundancy” of many-core systems to implicitly implement N-modular redundant (NMR) subsystems to achieve area-efficient fault-tolerant computing. However, while facing the different levels of soft error rate, task vulnerability, and task significance in the many-core system, existing core-level redundancy methods become ineffective. To achieve robust computation in many-core systems with intercore variations and mixed workloads, we propose a variation-aware core-level redundancy scheme. Two novel approaches are presented in this scheme: 1) we construct NMR tables that store the degree of redundancy using mathematical models for systems affected by these variations and 2) we dynamically allocate each replicated task to a proper core with variation-aware mapping algorithms to achieve high reliability. Based on a modified multicore simulator, Sniper-Transient Error Process Variation (TEVR), the experimental results show that the proposed scheme can increase the reliability by 47.92% and achieve the energy saving of 39% compared with conventional core-level redundancy methods. |
Databáze: | OpenAIRE |
Externí odkaz: |