Popis: |
Due to the openness and easy accessibility of online social media (OSM), anyone can easily contribute a simple paragraph of text to express their opinion on an article that they have seen. Without access control mechanisms, it has been reported that there are many suspicious messages and accounts spreading across multiple platforms. Accordingly, identifying and labeling fake news is a demanding problem due to the massive amount of heterogeneous content. In essence, the functions of machine learning (ML) and natural language processing (NLP) are to enhance, speed up, and automate the analytical process. Therefore, this unstructured text can be transformed into meaningful data and insights. In this paper, the combination of ML and NLP are implemented to classify fake news based on an open, large and labeled corpus on Twitter. In this case, we compare several state-of-the-art ML and neural network models based on content-only features. To enhance classification performance, before the training process, the term frequency-inverse document frequency (TF-IDF) features were applied in ML training, while word embedding was utilized in neural network training. By implementing ML and NLP methods, all the traditional models have greater than 85% accuracy. All the neural network models have greater than 90% accuracy. From the experiments, we found that the neural network models outperform the traditional ML models by, on average, approximately 6% precision, with all neural network models reaching up to 90% accuracy. |