Zeroing memory deallocator to reduce checkpoint sizes in virtualized HPC environments
Autor: | Simon Pickartz, Tim Süß, Lars Nagel, André Brinkmann, Ramy Gad, Stefan Lankes, Antonello Monti |
---|---|
Rok vydání: | 2018 |
Předmět: |
business.industry
Computer science Distributed computing Process (computing) 020206 networking & telecommunications Fault tolerance Hypervisor Cloud computing 02 engineering and technology Virtualization computer.software_genre Theoretical Computer Science Reduction (complexity) Hardware and Architecture Virtual machine 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Isolation (database systems) business computer Software Information Systems |
Zdroj: | The Journal of Supercomputing. 74:6236-6257 |
ISSN: | 1573-0484 0920-8542 |
DOI: | 10.1007/s11227-018-2548-6 |
Popis: | Virtualization has become an indispensable tool in data centers and cloud environments to flexibly assign virtual machines (VMs) to resources. Virtualization also becomes more and more attractive for high-performance computing (HPC). This is mainly due to the strong isolation of VMs which enables: (1) the sharing of cluster nodes and optimization of the system’s overall utilization; (2) load balancing by means of migrations due to the reduction of residual dependencies; and (3) the creation of system-level checkpoints increasing the fault tolerance in an application-transparent way. On the downside, the additional virtualization layer conceals information that is only available on the process level. This information has a direct influence on the checkpoint size which should be kept as small as possible. In this paper, we propose a novel technique for checkpoint size reduction in virtualized environments. We exploit the fact that the hypervisor detects zero pages which are omitted when capturing a checkpoint. Moreover, compression techniques are applied for a further reduction of the checkpoint size. We therefore fill freed memory regions with zeros supporting both the zero-page detection and the compression. We evaluate our approach by taking the example of HPC applications. The results reveal a reduction of the checkpoint size by up to 9% when compression is disabled in the hypervisor and up to 49% with compression enabled. Furthermore, memory zeroing is able to reduce VM migration time by up to 10% when compression is disabled and by up to 60% when compression is enabled. |
Databáze: | OpenAIRE |
Externí odkaz: | |
Nepřihlášeným uživatelům se plný text nezobrazuje | K zobrazení výsledku je třeba se přihlásit. |