Log Analysis-Based Resource and Execution Time Improvement in HPC: A Case Study
Autor: | Taeyoung Hong, ChanYeol Park, Heonchang Yu, JunWeon Yoon, Seo Young Noh |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
Job scheduler
Computer science Distributed computing 02 engineering and technology computer.software_genre Execution time lcsh:Technology lcsh:Chemistry Idle Resource (project management) 0202 electrical engineering electronic engineering information engineering General Materials Science Instrumentation Complex problems lcsh:QH301-705.5 Fluid Flow and Transfer Processes 020203 distributed computing lcsh:T parallel computing Process Chemistry and Technology General Engineering 020206 networking & telecommunications Root cause Supercomputer lcsh:QC1-999 Computer Science Applications lcsh:Biology (General) lcsh:QD1-999 lcsh:TA1-2040 HPC backfilling supercomputer job scheduling Inefficiency lcsh:Engineering (General). Civil engineering (General) computer lcsh:Physics |
Zdroj: | Applied Sciences Volume 10 Issue 7 Applied Sciences, Vol 10, Iss 2634, p 2634 (2020) |
ISSN: | 2076-3417 |
DOI: | 10.3390/app10072634 |
Popis: | High-performance computing (HPC) uses many distributed computing resources to solve large computational science problems through parallel computation. Such an approach can reduce overall job execution time and increase the capacity of solving large-scale and complex problems. In the supercomputer, the job scheduler, the HPC&rsquo s flagship tool, is responsible for distributing and managing the resources of large systems. In this paper, we analyze the execution log of the job scheduler for a certain period of time and propose an optimization approach to reduce the idle time of jobs. In our experiment, it has been found that the main root cause of delayed job is highly related to resource waiting. The execution time of the entire job is affected and significantly delayed due to the increase in idle resources that must be ready when submitting the large-scale job. The backfilling algorithm can optimize the inefficiency of these idle resources and help to reduce the execution time of the job. Therefore, we propose the backfilling algorithm, which can be applied to the supercomputer. This experimental result shows that the overall execution time is reduced. |
Databáze: | OpenAIRE |
Externí odkaz: |