Constructing Naive Bayesian Classification Model by Spark for Big Data

Autor: Huang Di, Liu Jiawei, Liu Fengyu, Wang Xiaofang, Luo Lan, Zou Qianyin
Rok vydání: 2020
Předmět:
Zdroj: 2020 17th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP).
DOI: 10.1109/iccwamtip51612.2020.9317362
Popis: Due to the development of big data technology, traditional machine learning algorithms are difficult to deal with massive data. To solve this problem, a naive Bayesian classifier based on parallel training and prediction on Spark platform is proposed. The classifier includes Laplace smoothing and normal distribution functions. Bayesian classification algorithm combined with Spark distributed platform to build a complete functional naive Bayesian model for data mining and data analysis and testing. Experimental results show that the accuracy of MLlib -based optimized continuous feature vector dataset is 9.75% higher than that of the traditional naive Bayes classification algorithm.
Databáze: OpenAIRE