Checkpointing and rollback recovery for network of workstations

Autor: Dingxing Wang, Meiming Shen, Weimin Zheng, Dongsheng Wang
Rok vydání: 1999
Předmět:
Zdroj: Science in China Series E: Technological Sciences. 42:207-214
ISSN: 1862-281X
1006-9321
DOI: 10.1007/bf02917117
Popis: Network of workstations (NOW) now becomes one of the main trends of parallel computing. But for long-running scientific programs, it needs effective fault tolerance for its changing property. Checkpointing and rollback recovery is a solution to this problem. First the main problems upon rollback recovery are discussed, the different checkpointing techniques for NOW are analyzed, and then the design and implementation of ChaRM (checkpoint-based rollback recovery and process migration) system are described. The comparison of three coordinated checkpointing systems is given.
Databáze: OpenAIRE