Popis: |
With the tremendous growth of big data, the issue of how to get valuable knowledge from these data becomes the main attention for researchers. The traditional analytic platforms and methods are not fit for processing such big data. Big data analytics plays a vital role in the research area for processing the unrelated, structured, and unstructured data. Machine learning algorithms are needed to improve for exploiting the opportunities hidden inside in big data. In this paper, the performance of Scalable Random Forest algorithm (SRF) is improved by hyperparameters optimization and dimension reduction technique. Big data analytics platform is developed using the Hadoop Distributed File System and Spark processing engine. The performance evaluation of improved SRF is conducted over real-world data center workload traces and model validation process is carried out to obtain the reliable error estimation for big data analytics. Our findings show that the tuning of hyperparameters is critical for different datasets, and optimization of these parameters significantly enhances prediction accuracy than the default parameters. |