An Empirical Comparison of Six Supervised Machine Learning Techniques on Spark Platform for Health Big Data

Autor: L. D. Dhinesh Babu, Gayathri Nagarajan
Rok vydání: 2018
Předmět:
Zdroj: Smart Intelligent Computing and Applications ISBN: 9789811319266
Popis: Health care is one of the prominent industries that generate voluminous data, thereby finding the need for machine learning techniques with big data solutions. The goal of this paper is to (i) compare the performance of the six different machine learning techniques in spark platform specifically for health big data and (ii) discuss the results from the experiments conducted on datasets of different characteristics, thereby drawing inferences and conclusion. Five benchmark health datasets are considered for experimentation. The metric chosen for comparison is the accuracy, and the computational time of the algorithms and the experiments are conducted with different proportions of training data. The experimental results show that the logistic regression and random forests perform well in terms of accuracy and naive Bayes outperforms other techniques in terms of computational time for health big datasets.
Databáze: OpenAIRE