Energy Accounting and Control with SLURM Resource and Job Management System

Autor: Morris Jette, Thomas Cadeau, Matthieu Hautreux, Yiannis Georgiou, Danny Auble, David Glesser
Přispěvatelé: Bull SAS (Bull), Bull SAS, PrograMming and scheduling design fOr Applications in Interactive Simulation (MOAIS), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire d'Informatique de Grenoble (LIG), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS), Université Grenoble Alpes [2016-2019] (UGA [2016-2019]), SchedMD, DAM Île-de-France (DAM/DIF), Direction des Applications Militaires (DAM), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA), Laboratoire d'Informatique de Grenoble (LIG), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)
Jazyk: angličtina
Rok vydání: 2014
Předmět:
Zdroj: ICDCN 2014
ICDCN 2014, Jan 2014, Coimbatore, India. ⟨10.1007/978-3-642-45249-9_7⟩
Distributed Computing and Networking ISBN: 9783642452482
ICDCN
DOI: 10.1007/978-3-642-45249-9_7⟩
Popis: International audience; Energy consumption has gradually become a very important parameter in High Performance Computing platforms. The Resource and Job Management System (RJMS) is the HPC middleware that is responsible for distributing computing power to applications and has knowledge of both the underlying resources and jobs needs. Therefore it is the best candidate for monitoring and controlling the energy consumption of the computations according to the job specifications. The integration of energy measurment mechanisms on RJMS and the consideration of energy consumption as a new characteristic in accounting seemed primordial at this time when energy has become a bottleneck to scalability. Since Power-Meters would be too expensive, other existing measurement models such as IPMI and RAPL can be exploited by the RJMS in order to track energy consumption and enhance the monitoring of the executions with energy considerations. In this paper we present the design and implementation of a new framework, developed upon SLURM Resource and Job Management System, which allows energy accounting per job with power profiling capabilities along with parameters for energy control features based on static frequency scaling of the CPUs. Since the goal of this work is the deployment of the framework on large petaflopic clusters such as CURIE, its cost and reliability are important issues. We evaluate the overhead of the design choices and the precision of the monitoring modes using different HPC benchmarks (Linpack, IMB, Stream) on a real-scale platform with integrated Power-meters. Our experiments show that the overhead is less than 0.6% in energy consumption and less than 0.2% in execution time while the error deviation compared to Power-meters less than 2% in most cases.
Databáze: OpenAIRE