Checkpointing Message-Passing Interface Programs

Autor: Li, Wei-Jih, 李偉誌
Rok vydání: 1996
Druh dokumentu: 學位論文 ; thesis
Popis: 84
Many scientific problems can be distributed on a large number of processors to take advantage of low cost workstations. MPI is a widely used standard for writing message-passing programs on distributed systems. A failure on any host may cause the computation losing and require restarting all applications. Checkpointing is a simple technique to recover the failed execution. We apply checkpointing on two implementations of MPI with different mechanisms.In this thesis we describe these two implementations of checkpointing. We also measure and compare the costs imposed by checkpointing.
Databáze: Networked Digital Library of Theses & Dissertations