Constructing Naive Bayesian Classification Model by Spark for Big Data
Autor: | Huang Di, Liu Jiawei, Liu Fengyu, Wang Xiaofang, Luo Lan, Zou Qianyin |
---|---|
Rok vydání: | 2020 |
Předmět: |
050101 languages & linguistics
Computer science business.industry Feature vector 05 social sciences Big data 02 engineering and technology Machine learning computer.software_genre Normal distribution Naive Bayes classifier ComputingMethodologies_PATTERNRECOGNITION Parallel processing (DSP implementation) Spark (mathematics) Classifier (linguistics) 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing 0501 psychology and cognitive sciences Artificial intelligence Additive smoothing business computer |
Zdroj: | 2020 17th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP). |
DOI: | 10.1109/iccwamtip51612.2020.9317362 |
Popis: | Due to the development of big data technology, traditional machine learning algorithms are difficult to deal with massive data. To solve this problem, a naive Bayesian classifier based on parallel training and prediction on Spark platform is proposed. The classifier includes Laplace smoothing and normal distribution functions. Bayesian classification algorithm combined with Spark distributed platform to build a complete functional naive Bayesian model for data mining and data analysis and testing. Experimental results show that the accuracy of MLlib -based optimized continuous feature vector dataset is 9.75% higher than that of the traditional naive Bayes classification algorithm. |
Databáze: | OpenAIRE |
Externí odkaz: |