Distributed Matrix Multiplication Performance Estimator for Machine Learning Jobs in Cloud Computing
Autor: | Kyungyong Lee, Myungjun Son |
---|---|
Rok vydání: | 2018 |
Předmět: |
Mean squared error
business.industry Computer science Estimator Cloud computing Workload 02 engineering and technology Machine learning computer.software_genre Matrix multiplication 020204 information systems 0202 electrical engineering electronic engineering information engineering Task analysis 020201 artificial intelligence & image processing Artificial intelligence business computer |
Zdroj: | IEEE CLOUD |
DOI: | 10.1109/cloud.2018.00088 |
Popis: | Matrix multiplication is an important kernel task in many machine learning algorithms. As the size of input datasets increases, multiple workloads are analyzed in large-scale distributed cloud computing environments. Therefore, understanding the characteristics of a distributed matrix multiplication task is essential for running machine learning jobs in the cloud. Herein, we propose Matrix multiplication Performance Estimator for Cloud computing, a method to predict the latency of matrix multiplication of various sizes and shapes in a distributed cloud computing environment. We first characterize the overhead of a distributed matrix multiplication task and propose features to model the latency of a task with different input types. Using the proposed features, a latency prediction model is developed by applying a data mining algorithm and a parameter optimization step iteratively. In experiments with 236 distinct types of matrix multiplications on diverse cloud instances running Apache Spark, we confirm that the proposed method can model the latency of various types of matrix multiplication tasks effectively and capture the non-linear interactions among the proposed features. A comparison with the state-of-the-art cloud computing performance predictor, Ernest, reveals that the proposed method provides 63% lower Root Mean Square Error (RMSE) for a distributed matrix multiplication latency prediction task and confirms the uniqueness of the distributed matrix multiplication workload. |
Databáze: | OpenAIRE |
Externí odkaz: |