Exploring Hierarchical Forecasting of Data Popularity in High-Energy Physics Experiments.

Autor: Grigorieva, M. A., Popova, N. N., Vartanov, D. A., Shubin, M. V.
Zdroj: Lobachevskii Journal of Mathematics; Aug2023, Vol. 44 Issue 8, p3076-3090, 15p
Abstrakt: In high-energy physics, the current large-scale distributed computing environments are responsible for processing and analyzing vast amounts of data. Recently, CERN's total data storage reached an impressive 1 exabyte. This development has presented new challenges for data and workload management systems, which must ensure that resources are balanced and that data is evenly distributed across hundreds of computing centers in a storage-saving environment. To achieve this goal, the demand for data must take into account data popularity. This research explored two approaches to predicting data popularity: regression models (LSTM, Facebook Prophet) for predictive analysis of the popularity of groups of datasets, and classification models (LSTM, FCN, MLP, Logistic Regression, AdaBoost, CATBoost, XGBoost) for predicting the popularity of individual datasets. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index