RelaxFault memory repair
Autor: | Mattan Erez, Dong Wan Kim |
---|---|
Rok vydání: | 2016 |
Předmět: |
010302 applied physics
Redundant array of independent memory business.industry Computer science Fault tolerance 02 engineering and technology General Medicine Fault (power engineering) Supercomputer 01 natural sciences 020202 computer hardware & architecture Microarchitecture Reliability engineering Memory management Memory module Embedded system 0103 physical sciences 0202 electrical engineering electronic engineering information engineering Overhead (computing) Fault model business Resilience (network) Dram |
Zdroj: | ISCA |
ISSN: | 0163-5964 |
Popis: | Memory system reliability is a serious concern in many systems today, and is becoming more worrisome as technology scales and system size grows. Stronger fault tolerance capability is therefore desirable, but often comes at high cost. In this paper, we propose a low-cost, fault-aware, hardware-only resilience mechanism, RelaxFault , that repairs the vast majority of memory faults using a small amount of the LLC to remap faulty memory locations. RelaxFault requires less than 100KiB of LLC capacity, has near-zero impact on performance and power. By repairing faults, RelaxFault relaxes the requirement for high fault tolerance of other mechanisms, such as ECC. A better tradeoff between resilience and overhead is made by exploiting an understanding of memory system architecture and fault characteristics. We show that RelaxFault provides better repair capability than prior work of similar cost, improves memory reliability to a greater extent, and significantly reduces the number of maintenance events and memory module replacements. We also propose a more refined memory fault model than prior work and demonstrate its importance. |
Databáze: | OpenAIRE |
Externí odkaz: |