Seamlessly Managing HPC Workloads Through Kubernetes

Autor:	Ignacio Blanquer, J. Damià Segrelles, Sergio López-Huguet, Marek Kasztelnik, Marian Bubak
Jazyk:	angličtina
Rok vydání:	2020
Předmět:	Job scheduler 0303 health sciences ComputerSystemsOrganization_COMPUTERSYSTEMIMPLEMENTATION Computer science business.industry Integrating cloud and HPC Workload Cloud computing 02 engineering and technology computer.software_genre Supercomputer 3. Good health 03 medical and health sciences Docker and Singularity containers 0202 electrical engineering electronic engineering information engineering Operating system Batch processing CIENCIAS DE LA COMPUTACION E INTELIGENCIA ARTIFICIAL 020201 artificial intelligence & image processing Kubernetes business computer LENGUAJES Y SISTEMAS INFORMATICOS 030304 developmental biology
Zdroj:	Lecture Notes in Computer Science ISBN: 9783030598501 ISC Workshops RiuNet. Repositorio Institucional de la Universitat Politécnica de Valéncia instname
Popis:	[EN] This paper describes an approach to integrate the jobs management of High Performance Computing (HPC) infrastructures in cloud architectures by managing HPC workloads seamlessly from the cloud job scheduler. The paper presents hpc-connector, an open source tool that is designed for managing the full life cycle of jobs in the HPC infrastructure from the cloud job scheduler interacting with the workload manager of the HPC system. The key point is that, thanks to running hpc-connector in the cloud infrastructure, it is possible to reflect in the cloud infrastructure, the execution of a job running in the HPC infrastructure managed by hpc-connector. If the user cancels the cloud-job, as hpc-connector catches Operating System (OS) signals (for example, SIGINT), it will cancel the job in the HPC infrastructure too. Furthermore, it can retrieve logs if requested. Therefore, by using hpc-connector, the cloud job scheduler can manage the jobs in the HPC infrastructure without requiring any special privilege, as it does not need changes on the Job scheduler. Finally, we perform an experiment training a neural network for automated segmentation of Neuroblastoma tumours in the Prometheus supercomputer using hpc-connector as a batch job from a Kubernetes infrastructure. The work presented in this article has been partially funded by the regional government of the Comunitat Valenciana (Spain), co-funded by the European Union ERDF funds (European Regional Development Fund) of the Comunitat Valenciana 2014¿2020, with reference IDIFEDER/2018/032 (High-Performance Algorithms for the Modeling, Simulation and early Detection of diseases in Personalized Medicine). The work is also co-funded by PRIMAGE (PRedictive In-silico Multiscale Analytics to support cancer personalised diaGnosis and prognosis, empowered by imaging biomarkers) a Horizon 2020 RIA project funded under the topic SC1-DTH-07-2018 by the European Commission, with grant agreement no: 826494.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::e73e4b8e0c144bdc3ccb079b5c18d4d5 https://hdl.handle.net/10251/179810 Zobrazit plný text záznamu