Improving the management efficiency of GPU workloads in data centers through GPU virtualization

Autor:	Javier Prades, Sergio Iserte, Carlos Reaño, Federico Silla
Jazyk:	angličtina
Rok vydání:	2019
Předmět:	Computer science Computer Networks and Communications Management efficiency InfiniBand GPU RCUDA CUDA computer.software_genre Theoretical Computer Science Data centers Slurm Virtualization rCUDA Computer Science Applications ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES data centers Computational Theory and Mathematics Operating system computer Software
Zdroj:	Iserte, S, Prades, J, Reaño, C & Silla, F 2019, ' Improving the management efficiency of GPU workloads in data centers through GPU virtualization ', Concurrency Computation, pp. 1-16 . https://doi.org/10.1002/cpe.5275 RiuNet. Repositorio Institucional de la Universitat Politécnica de Valéncia instname Repositori Universitat Jaume I Universitat Jaume I
ISSN:	2014-5349
DOI:	10.1002/cpe.5275
Popis:	[EN] Graphics processing units (GPUs) are currently used in data centers to reduce the execution time of compute-intensive applications. However, the use of GPUs presents several side effects, such as increased acquisition costs and larger space requirements. Furthermore, GPUs require a nonnegligible amount of energy even while idle. Additionally, GPU utilization is usually low for most applications. In a similar way to the use of virtual machines, using virtual GPUs may address the concerns associated with the use of these devices. In this regard, the remote GPU virtualization mechanism could be leveraged to share the GPUs present in the computing facility among the nodes of the cluster. This would increase overall GPU utilization, thus reducing the negative impact of the increased costs mentioned before. Reducing the amount of GPUs installed in the cluster could also be possible. However, in the same way as job schedulers map GPU resources to applications, virtual GPUs should also be scheduled before job execution. Nevertheless, current job schedulers are not able to deal with virtual GPUs. In this paper, we analyze the performance attained by a cluster using the remote Compute Unified Device Architecture middleware and a modified version of the Slurm scheduler, which is now able to assign remote GPUs to jobs. Results show that cluster throughput, measured as jobs completed per time unit, is doubled at the same time that the total energy consumption is reduced up to 40%. GPU utilization is also increased. Generalitat Valenciana, Grant/Award Number: PROMETEO/2017/077; MINECO and FEDER, Grant/Award Number: TIN2014-53495-R, TIN2015-65316-P and TIN2017-82972-R
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::e74dd4032ebd1cea07960465b1701d6f https://pure.qub.ac.uk/en/publications/improving-the-management-efficiency-of-gpu-workloads-in-data-centers-through-gpu-virtualization(1533df08-3e38-4505-ae2d-b9c26dcc178d).html Zobrazit plný text záznamu