Automating System Configuration of Distributed Machine Learning
Autor: | Brian Cho, Joo Yeon Kim, Gyeong-In Yu, Byung-Gon Chun, Hojin Park, Woo-Yeon Lee, Yunseong Lee, Markus Weimer, Beomyeol Jeon, Joo Jeong, Won Wook Song, Gunhee Kim |
---|---|
Rok vydání: | 2019 |
Předmět: |
Training set
business.industry Computer science Control reconfiguration 02 engineering and technology System configuration Machine learning computer.software_genre Data modeling 020204 information systems Server 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Artificial intelligence business computer |
Zdroj: | ICDCS |
Popis: | The performance of distributed machine learning systems is dependent on their system configuration. However, configuring the system for optimal performance is challenging and time consuming even for experts due to the diverse runtime factors such as workloads or the system environment. We present cost-based optimization to automatically find a good system configuration for parameter server (PS) machine learning (ML) frameworks. We design and implement Cruise that applies the optimization technique to tune distributed PS ML execution automatically. Evaluation results on three ML applications verify that Cruise automates the system configuration of the applications to achieve good performance with minor reconfiguration costs. |
Databáze: | OpenAIRE |
Externí odkaz: |