An Integrated Job Monitor, Analyzer and Predictor

Autor:	Ashish Pal, Preeti Malakar
Rok vydání:	2021
Předmět:	Spectrum analyzer Database Computer science Computer cluster Job analysis Start time State (computer science) Prediction system computer.software_genre Supercomputer Queue computer
Zdroj:	CLUSTER
Popis:	High performance computing systems are used for compute-intensive jobs by multiple users. The users submit jobs to batch queues where the jobs are queued for an unknown amount of time until the required resources are available. A large amount of data (submit time, start time, end time, nodes allocated) is collected about these jobs. Analyzing complex logs of large systems is tedious. It is helpful to automatically analyze the logs in real-time and take reactive measures. In this paper, we present a unified job analysis and prediction system for supercomputer jobs. The users and administrators can monitor the current system state, analyze historical data and predict wait-times of future jobs. We evaluated our wait-time predictors on real job traces from 10 different systems. We observed 92.3% lower average prediction errors, as compared to existing methods.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::ee987e54062044b927aa44e4e585e87c https://doi.org/10.1109/cluster48925.2021.00091 Zobrazit plný text záznamu