Designing A Model for Fake News Detection in Social Media Using Machine Learning Techniques

Autor: Shalini, A Kumari, Saxena, Sameer, Kumar, B Suresh
Jazyk: angličtina
Rok vydání: 2023
Předmět:
Zdroj: International Journal of Intelligent Systems and Applications in Engineering; Vol. 11 No. 2s (2023); 218 – 226
ISSN: 2147-6799
Popis: Fake identity is a critical problem nowadays of social media. Fake news is rapidly spread by fake identities and bots that generate the trustworthiness issue on social media. Identifying the profiles and accounts using soft computing algorithms is necessary to improve the trustworthiness of social media. In this work, we proposed Recurrent Neural Network to identify fake identities on social media. Initially, we extract data from social media such as Twitter.com using Twitter API. Hybrid feature extraction has been done based on the characteristics of data. It generates training rules which are associated with a fake and legitimate profiles generated by a human. In pre-processing and filtration process, all bot entries are eliminated using a policy-based approach. To generate strict rules that improve the classification accuracy, the training of dataset primarily focuses on attributes such as friends count, the total number of followers, tweet counts, re-tweets count, etc., The Recurrent Neural Network (RNN) categorizes each profile based on training and testing modules. This work focuses on classifying bots or human entries according to their extracted features using machine learning. Once the training phase is completed, features are extracted from the dataset based on the term frequency on which the classification technique is applied. The proposed work is very effective in detecting malicious accounts from an imbalanced dataset in social media. The system provides maximum accuracy for the classification of fake and real identities on the social media dataset. It achieves good accuracy with Recurrent Neural Network (RNN) using the different activation functions. The system improves the classification accuracy with the increase in the number of folds in cross-validation. In experiment analysis, we have done testing on synthetic and real-time social media datasets; We achieve around 96% accuracy on the real-time Twitter dataset while 98% accuracy on synthetic social media datasets.
Databáze: OpenAIRE