GASPI/GPI In-memory Checkpointing Library

Autor: Mirko Rahn, Rui Machado, Valeria Bartsch, Dirk Merten, Franz-Josef Pfreundt
Rok vydání: 2017
Předmět:
Zdroj: Lecture Notes in Computer Science ISBN: 9783319642024
Euro-Par
Popis: Fault tolerance becomes an important feature at large computer systems where the mean time between failure decreases. Checkpointing is a method often used to provide resilience. We present an in-memory checkpointing library based on a PGAS API implemented with GASPI/GPI. It offers a substantial benefit when recovering from failure and leverages existing fault tolerance features of GASPI/GPI. The overhead of the library is negligible when testing it with a simple stencil code and a real life seismic imaging method.
Databáze: OpenAIRE