Backfilling HPC Jobs with a Multimodal-Aware Predictor
Autor: | Kenneth Lamar, Damian Dechev, Jim Brandt, Benjamin A. Allan, Christina Peterson, Alexander V. Goponenko |
---|---|
Rok vydání: | 2021 |
Předmět: |
Job scheduler
Schedule ComputingMilieux_THECOMPUTINGPROFESSION Operations research Computer science media_common.quotation_subject computer.software_genre Turnaround time Scheduling (computing) Resource (project management) Computer cluster Quality (business) Duration (project management) computer media_common |
Zdroj: | CLUSTER |
DOI: | 10.1109/cluster48925.2021.00093 |
Popis: | Job scheduling aims to minimize the turnaround time on the submitted jobs while catering to the resource constraints of High Performance Computing (HPC) systems. The challenge with scheduling is that it must honor job requirements and priorities while actual job run times are unknown. Although approaches have been proposed that use classification techniques or machine learning to predict job run times for scheduling purposes, these approaches do not provide a technique for reducing underprediction, which has a negative impact on scheduling quality. A common cause of underprediction is that the distribution of the duration for a job class is multimodal, causing the average job duration to fall below the expected duration of longer jobs. In this work, we propose the Top Percent predictor, which uses a hierarchical classification scheme to provide better accuracy for job run time predictions than the user-requested time. Our predictor addresses multimodal job distributions by making a prediction that is higher than a specified percentage of the observed job run times. We integrate the Top Percent predictor into scheduling algorithms and evaluate the performance using schedule quality metrics found in literature. To accommodate the user policies of HPC systems, we propose priority metrics that account for job flow time, job resource requirements, and job priority. The experiments demonstrate that the Top Percent predictor outperforms the related approaches when evaluated using our proposed priority metrics. |
Databáze: | OpenAIRE |
Externí odkaz: |