Dependable Computing Systems Based on Checkpointing and Rollback Recovery Mechanisms

Autor: Jeng-Ping Lin, 林正平
Rok vydání: 1999
Druh dokumentu: 學位論文 ; thesis
Popis: 87
In fault-tolerant computing systems, the checkpoint and rollback recovery mechanism is usually employed to achieve dependable computing. The state of a process is periodically saved to stable storage during failure-free execution. This saved process state is called a checkpoint. Rollback recovery mechanism restarts the process using the saved checkpoint upon a failure. For long-running applications such as unmanned space flights, the checkpointing and rollback recovery mechanism can be used to minimize total execution time when failures occur. For mission-critical service-providing applications such as the banking system, the checkpointing and rollback recovery mechanism can provide faster recovery to reduce service down time and thus improve service availability. The problem of checkpoint and rollback recovery has been extensively studied on message-passing static systems. Therefore, we focus on the application of the checkpoint and rollback recovery mechanism to other systems in this thesis. We explore two kinds of systems, one is shared-memory multiprocessor systems and the other is mobile computing systems with wireless communication. In shared-memory multiprocessor systems, process communication is not through message passing but reading and writing of shared variables in shared memory. In addition, the addition of hardware may be required to reduce the system overhead due to checkpoint and rollback recovery mechanisms.
Databáze: Networked Digital Library of Theses & Dissertations