CrossCheck: A Holistic Approach for Tolerating Crash-Faults and Arbitrary Failures
Autor: | Christoph Borchert, Olaf Spinczyk, Arthur Martens, Rüdiger Kapitza, Manuel Nieke |
---|---|
Rok vydání: | 2016 |
Předmět: |
021110 strategic
defence & security studies Multi-core processor business.industry Computer science Distributed computing 0211 other engineering and technologies Crash Fault tolerance 02 engineering and technology Reliability engineering Software High availability Checksum 0202 electrical engineering electronic engineering information engineering Recovery mechanism 020201 artificial intelligence & image processing The Internet business |
Zdroj: | EDCC |
Popis: | High availability is no longer optional since more and more Internet-based services provide economical or otherwise critical offerings. Traditionally, crash faults are addressed using state-machine replication (SMR) and critical data is selectively protected by checksums. Both techniques can be efficiently combined, however, large parts of a service remain susceptible to transient errors such as bit-flips or more severe state corruptions. To address this weakness and also to reduce the labouring and non-trivial effort of identifying and selectively hardening a complex service, we propose CrossCheck – a holistic approach. CrossCheck extends the crash-fault protection of SMR to also provide tolerance against arbitrary state corruptions, thereby especially addressing multithreaded applications. This is achieved by a fine-grained state comparison and a precise recovery mechanism using fault-free replicas. The implementation utilizes aspectoriented programming and therefore requires only minimal manual changes to the underlying software. In our evaluation, we show that a multithreaded key-value store can be made resilient to crashes and hardened against arbitrary state corruptions with moderate overhead. |
Databáze: | OpenAIRE |
Externí odkaz: |