A Survey on Multithreading Alternatives for Soft Error Fault Tolerance
Autor: | Isil Oz, Sanem Arslan |
---|---|
Rok vydání: | 2019 |
Předmět: |
020203 distributed computing
General Computer Science Computer science business.industry Fault tolerance 02 engineering and technology Fault detection and isolation 020202 computer hardware & architecture Theoretical Computer Science Soft error Software Computer engineering Multithreading 0202 electrical engineering electronic engineering information engineering Redundancy (engineering) business Implementation Strengths and weaknesses |
Zdroj: | ACM Computing Surveys. 52:1-38 |
ISSN: | 1557-7341 0360-0300 |
DOI: | 10.1145/3302255 |
Popis: | Smaller transistor sizes and reduction in voltage levels in modern microprocessors induce higher soft error rates. This trend makes reliability a primary design constraint for computer systems. Redundant multithreading (RMT) makes use of parallelism in modern systems by employing thread-level time redundancy for fault detection and recovery. RMT can detect faults by running identical copies of the program as separate threads in parallel execution units with identical inputs and comparing their outputs. In this article, we present a survey of RMT implementations at different architectural levels with several design considerations. We explain the implementations in seminal papers and their extensions and discuss the design choices employed by the techniques. We review both hardware and software approaches by presenting the main characteristics and analyze the studies with different design choices regarding their strengths and weaknesses. We also present a classification to help potential users find a suitable method for their requirement and to guide researchers planning to work on this area by providing insights into the future trend. |
Databáze: | OpenAIRE |
Externí odkaz: |