Co-scheML: Interference-aware Container Co-scheduling Scheme Using Machine Learning Application Profiles for GPU Clusters

Autor: Sejin Kim, Yoonhee Kim
Rok vydání: 2020
Předmět:
Zdroj: CLUSTER
Popis: Recently, efficient execution of applications on Graphic Processing Unit(GPU) has emerged as a research topic to increase overall system throughput in cluster environment. As a current cluster orchestration platform using GPUs only supports an exclusive execution of an application on a GPU, the platform may not utilize resource of GPUs fully relying on application characteristics. Nonetheless, co-execution of GPU applications leads to interference coming from resource contention among applications. If diverse resource usage characteristics of GPU applications are not deliberated, unbalanced usage of computing resources and performance degradation could be induced in a GPU cluster. This study introduces Co-scheML for co-execution of various GPU applications such as High Performance Computing (HPC), Deep Learning (DL) Training, and DL Inference. Interference model is constructed by applying Machine Learning (ML) model with GPU metrics since predicting interference has a difficulty. Predicted interference is utilized and deployment of an application is determined by Co-scheML scheduler. Experimental results of the Co-ScheML strategy show that average job completion time is improved by 23%, and the makespan is shortened by 22% in average, as compared to baseline schedulers.
Databáze: OpenAIRE