A Web News Classification Method: Fusion Noise Filtering and Convolutional Neural Network

Autor: Yanli Hu, Aixia Zhou, Zhen Tan, Chong Zhang, Bin Ge, Chunhui He
Rok vydání: 2020
Předmět:
Zdroj: SSPS
DOI: 10.1145/3421515.3421523
Popis: As the way of Internet information transfer, web news plays a significant role in information sharing. Considering that web news usually contains a lot of content, after in-depth analysis, we found that not all content is related to the news topic, and a lot of web news contains some noise content, and these noises content have serious interference to the text classification task. So, how to filter noise and purify web news content to improve the accuracy of web news classification has become a challenging problem. In this paper, we proposed a web news classification method via fusing noise detection, BERT-based semantic similarity noise filtering and convolutional neural network (NF-CNN) to solve the problem. In order to comprehensively evaluate the performance of the method, we use the Chinese public news classification dataset to evaluate it. The experimental results demonstrate that our method can effectively detect and filter a lot of noise text and the average F1 score can reach 95.61% on web news classification task.
Databáze: OpenAIRE