Architectural vulnerability aware checkpoint placement in a multicore processor

Autor: Atieh Lotfi, Arash Bayat, Saeed Safari
Rok vydání: 2012
Předmět:
Zdroj: IOLTS
Popis: As the system complexity increases, the failure probability increases substantially. Therefore, the system requires techniques for supporting fault tolerance. Checkpointing technique is widely used to reduce the execution time of long-running programs in presence of failures and enhancing the reliability of such systems. Several methods were studied thus far in order to determine the checkpointing interval which optimizes system performance. The crucial parameter in all of these solutions is system failure model which is primarily assumed as exponential or Weibull distributions. But, these models are not perfectly accurate since they fail to model the effect of soft errors. In this paper, we introduce a more realistic failure model based on the processors AVF. In addition, we propose three checkpoint placement methods with constant and variable intervals that determine suitable checkpoint places for the proposed failure model. Our experimental results show that our method, which is implementable on any multicore system, can find the suitable points in which checkpoints should be taken.
Databáze: OpenAIRE