Automating System Configuration of Distributed Machine Learning

Autor: Brian Cho, Joo Yeon Kim, Gyeong-In Yu, Byung-Gon Chun, Hojin Park, Woo-Yeon Lee, Yunseong Lee, Markus Weimer, Beomyeol Jeon, Joo Jeong, Won Wook Song, Gunhee Kim
Rok vydání: 2019
Předmět:
Zdroj: ICDCS
Popis: The performance of distributed machine learning systems is dependent on their system configuration. However, configuring the system for optimal performance is challenging and time consuming even for experts due to the diverse runtime factors such as workloads or the system environment. We present cost-based optimization to automatically find a good system configuration for parameter server (PS) machine learning (ML) frameworks. We design and implement Cruise that applies the optimization technique to tune distributed PS ML execution automatically. Evaluation results on three ML applications verify that Cruise automates the system configuration of the applications to achieve good performance with minor reconfiguration costs.
Databáze: OpenAIRE