CrossCheck: A Holistic Approach for Tolerating Crash-Faults and Arbitrary Failures

Autor: Christoph Borchert, Olaf Spinczyk, Arthur Martens, Rüdiger Kapitza, Manuel Nieke
Rok vydání: 2016
Předmět:
Zdroj: EDCC
Popis: High availability is no longer optional since more and more Internet-based services provide economical or otherwise critical offerings. Traditionally, crash faults are addressed using state-machine replication (SMR) and critical data is selectively protected by checksums. Both techniques can be efficiently combined, however, large parts of a service remain susceptible to transient errors such as bit-flips or more severe state corruptions. To address this weakness and also to reduce the labouring and non-trivial effort of identifying and selectively hardening a complex service, we propose CrossCheck – a holistic approach. CrossCheck extends the crash-fault protection of SMR to also provide tolerance against arbitrary state corruptions, thereby especially addressing multithreaded applications. This is achieved by a fine-grained state comparison and a precise recovery mechanism using fault-free replicas. The implementation utilizes aspectoriented programming and therefore requires only minimal manual changes to the underlying software. In our evaluation, we show that a multithreaded key-value store can be made resilient to crashes and hardened against arbitrary state corruptions with moderate overhead.
Databáze: OpenAIRE