Intelligent Analysis of Arabic Tweets for Detection of Suspicious Messages

Autor: Mohammed A. AlGhamdi, Murtaza Ali Khan
Rok vydání: 2020
Předmět:
Zdroj: Arabian Journal for Science and Engineering. 45:6021-6032
ISSN: 2191-4281
2193-567X
Popis: With the widespread use of messaging via social networks such as Twitter, Instagram, and Facebook, it is becoming imperative for researchers to devise intelligent systems for data analytics in the range of domains like business, health, communication, security, etc. The complex morphological and syntactic structure of Arabic sentences makes them difficult to analyze. This paper presents an intelligent system to analyze Arabic tweets for detecting suspicious messages. We acquired Arabic tweet data from micro-blogging social network Twitter via Twitter Streaming Application Programming Interface and save it in a required file format. The system tokenizes and preprocesses the tweet dataset. Manual labeling is performed on tweet dataset for suspicious (label 1) and not-suspicious (label 0) classes. The labeled tweet dataset is used to train a classifier using supervised machine learning algorithms for the detection of suspicious activities. During the testing phase, the system processes unlabeled tweet data and detects either it belongs to a suspicious or not-suspicious class. We tested the system using six supervised machine learning algorithms: (1) decision tree, (2) k-nearest neighbors, (3) linear discriminant algorithm, (4) support vector machine, (5) artificial neural networks, and (6) long short-term memory networks. A comparative analysis in terms of accuracy, execution time, and confusion matrices of the six classifiers is presented. The execution speed of ANN is lowest. In terms of predicting correct results, the SVM performs best among all the classifiers and yields 86.72% mean accuracy. The major outcomes of this work are development of labeled dataset of Arabic tweets, an intelligent behavior analysis of tweets using six machine learning algorithms to detect suspicious messages, a comparative analysis of six machine learning algorithms, and a development of a statistical benchmark that can be used for future studies about the detection of crimes on social media.
Databáze: OpenAIRE
Nepřihlášeným uživatelům se plný text nezobrazuje