Popis: |
Big data processing frameworks (e.g., Spark, Storm) have been extensively used for massive data processing in the industry. To improve the performance and robustness of these frameworks, developers provide users with highly-configurable parameters. Due to the high-dimensional parameter space and complicated interactions of parameters, manual tuning of parameters is time-consuming and ineffective. Building performance-predicting models for big data frameworks is challenging for several reasons: (1) the significant time required to collect training data and (2) the poor accuracy of the prediction model when training data are limited. To meet this challenge, we proposes an auto-tuning configuration parameters system (ATCS), a new auto-tuning approach based on Generative Adversarial Nets (GAN). ATCS can build a performance prediction model with less training data and without sacrificing model accuracy. Moreover, an optimized Genetic Algorithm (GA) is used in ATCS to explore the parameter space for optimum solutions. To prove the effectiveness of ATCS, we select five frequently-used workloads in Spark, each of which runs on five different sized data sets. The results demonstrate that ATCS improves the performance of five frequently-used Spark workloads compared to the default configurations. We achieved a performance increase of 3.5× on average, with a maximum of 6.9×. To obtain similar model accuracy, experiment results also demonstrate that the quantity of ATCS training data is only 6% of Deep Neural Network (DNN) data, 13% of Support Vector Machine (SVM) data, 18% of Decision Tree (DT) data. Moreover, compared to other machine learning models, the average performance increase of ATCS is 1.7× that of DNN, 1.6× that of SVM, 1.7× that of DT on the five typical Spark programs. |