Constructing Naive Bayesian Classification Model by Spark for Big Data

Autor:	Huang Di, Liu Jiawei, Liu Fengyu, Wang Xiaofang, Luo Lan, Zou Qianyin
Rok vydání:	2020
Předmět:	050101 languages & linguistics Computer science business.industry Feature vector 05 social sciences Big data 02 engineering and technology Machine learning computer.software_genre Normal distribution Naive Bayes classifier ComputingMethodologies_PATTERNRECOGNITION Parallel processing (DSP implementation) Spark (mathematics) Classifier (linguistics) 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing 0501 psychology and cognitive sciences Artificial intelligence Additive smoothing business computer
Zdroj:	2020 17th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP).
DOI:	10.1109/iccwamtip51612.2020.9317362
Popis:	Due to the development of big data technology, traditional machine learning algorithms are difficult to deal with massive data. To solve this problem, a naive Bayesian classifier based on parallel training and prediction on Spark platform is proposed. The classifier includes Laplace smoothing and normal distribution functions. Bayesian classification algorithm combined with Spark distributed platform to build a complete functional naive Bayesian model for data mining and data analysis and testing. Experimental results show that the accuracy of MLlib -based optimized continuous feature vector dataset is 9.75% higher than that of the traditional naive Bayes classification algorithm.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::0d7b44211d8fb83b1170468048e16f54 https://doi.org/10.1109/iccwamtip51612.2020.9317362 Zobrazit plný text záznamu