Performance Evaluation of Random Forest Algorithm in Cluster Environment

Autor: Cinantya Paramita, Catur Supriyanto, Yani Parti Astuti, Lukman Afi Syariffudin, Fauzi Adi Rafrastara
Jazyk: angličtina
Rok vydání: 2022
Předmět:
DOI: 10.5281/zenodo.5852758
Popis: —Cluster computing was introduced to replace the superiority of super computers. Cluster computing is able to overcome the problems that cannot be effectively dealt with supercomputers. In this paper, we are going to evaluate the performance of cluster computing by executing one of data mining techniques in the cluster environment. The experiment will attempt to predict the flight delay by using random forest algorithm with apache spark as a framework for cluster computing. The result shows that, by involving 5 PC’s in cluster environment with equal specifications can increase the performance of computation up to 39.76% compared to the standalone one. Attaching more nodes to the cluster can make the process become faster significantly. Keywords—Cluster computing, random forest, flight delay prediction, pyspark, apache spark.
Databáze: OpenAIRE