HPC Storage Service Autotuning Using Variational- Autoencoder -Guided Asynchronous Bayesian Optimization

Autor:	Matthieu Dorier, Romain Egele, Prasanna Balaprakash, Jaehoon Koo, Sandeep Madireddy, Srinivasan Ramesh, Allen D. Malony, Rob Ross
Přispěvatelé:	Argonne National Laboratory [Lemont] (ANL), Université Paris-Saclay, TAckling the Underspecified (TAU), Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Interdisciplinaire des Sciences du Numérique (LISN), Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS), University of Oregon [Eugene], ANR-19-CHIA-0022,HUMANIA,Intelligence Artificielle pour Tous(2019)
Jazyk:	angličtina
Rok vydání:	2022
Předmět:	FOS: Computer and information sciences Computer Science - Machine Learning Bayesian Optimization I/O Mochi Storage DeepHyper Machine Learning (cs.LG) Computer Science - Distributed Parallel and Cluster Computing HPC Autotuning Transfer Learn- ing [INFO]Computer Science [cs] Distributed Parallel and Cluster Computing (cs.DC)
Zdroj:	CLUSTER 2022-IEEE International Conference on Cluster Computing (CLUSTER) CLUSTER 2022-IEEE International Conference on Cluster Computing (CLUSTER), Sep 2022, Heidelberg, Germany. pp.381-393, ⟨10.1109/CLUSTER51413.2022.00049⟩
Popis:	Distributed data storage services tailored to specific applications have grown popular in the high-performance computing (HPC) community as a way to address I/O and storage challenges. These services offer a variety of specific interfaces, semantics, and data representations. They also expose many tuning parameters, making it difficult for their users to find the best configuration for a given workload and platform. To address this issue, we develop a novel variational-autoencoder-guided asynchronous Bayesian optimization method to tune HPC storage service parameters. Our approach uses transfer learning to leverage prior tuning results and use a dynamically updated surrogate model to explore the large parameter search space in a systematic way. We implement our approach within the DeepHyper open-source framework, and apply it to the autotuning of a high-energy physics workflow on Argonne's Theta supercomputer. We show that our transfer-learning approach enables a more than $40\times$ search speedup over random search, compared with a $2.5\times$ to $10\times$ speedup when not using transfer learning. Additionally, we show that our approach is on par with state-of-the-art autotuning frameworks in speed and outperforms them in resource utilization and parallelization capabilities. Accepted at IEEE Cluster 2022
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::afe01b69b639ebd0abcfe69f48db14cb https://hal.science/hal-03864478 Zobrazit plný text záznamu