Leveraging Hardware Performance Counters for Predicting Workload Interference in Vector Supercomputers

Autor: Shubham, Takahashi, Keichi, Takizawa, Hiroyuki
Rok vydání: 2024
Předmět:
Druh dokumentu: Working Paper
Popis: In the rapidly evolving domain of high-performance computing (HPC), heterogeneous architectures such as the SX-Aurora TSUBASA (SX-AT) system architecture, which integrate diverse processor types, present both opportunities and challenges for optimizing resource utilization. This paper investigates workload interference within an SX-AT system, with a specific focus on resource contention between Vector Hosts (VHs) and Vector Engines (VEs). Through comprehensive empirical analysis, the study identifies key factors contributing to performance degradation, such as cache and memory bandwidth contention, when jobs with varying computational demands share resources. To address these issues, we develop a predictive model that leverages hardware performance counters (HCs) and machine learning (ML) algorithms to classify and predict workload interference. Our results demonstrate that the model accurately forecasts performance degradation, offering valuable insights for future research on optimizing job scheduling and resource allocation. This approach highlights the importance of adaptive resource management strategies in maintaining system efficiency and provides a foundation for future enhancements in heterogeneous supercomputing environments.
Databáze: arXiv