Zeroing memory deallocator to reduce checkpoint sizes in virtualized HPC environments

Autor: Simon Pickartz, Tim Süß, Lars Nagel, André Brinkmann, Ramy Gad, Stefan Lankes, Antonello Monti
Rok vydání: 2018
Předmět:
Zdroj: The Journal of Supercomputing. 74:6236-6257
ISSN: 1573-0484
0920-8542
DOI: 10.1007/s11227-018-2548-6
Popis: Virtualization has become an indispensable tool in data centers and cloud environments to flexibly assign virtual machines (VMs) to resources. Virtualization also becomes more and more attractive for high-performance computing (HPC). This is mainly due to the strong isolation of VMs which enables: (1) the sharing of cluster nodes and optimization of the system’s overall utilization; (2) load balancing by means of migrations due to the reduction of residual dependencies; and (3) the creation of system-level checkpoints increasing the fault tolerance in an application-transparent way. On the downside, the additional virtualization layer conceals information that is only available on the process level. This information has a direct influence on the checkpoint size which should be kept as small as possible. In this paper, we propose a novel technique for checkpoint size reduction in virtualized environments. We exploit the fact that the hypervisor detects zero pages which are omitted when capturing a checkpoint. Moreover, compression techniques are applied for a further reduction of the checkpoint size. We therefore fill freed memory regions with zeros supporting both the zero-page detection and the compression. We evaluate our approach by taking the example of HPC applications. The results reveal a reduction of the checkpoint size by up to 9% when compression is disabled in the hypervisor and up to 49% with compression enabled. Furthermore, memory zeroing is able to reduce VM migration time by up to 10% when compression is disabled and by up to 60% when compression is enabled.
Databáze: OpenAIRE
Nepřihlášeným uživatelům se plný text nezobrazuje