A Hybrid CNN-LSTM and XGBoost Approach for Crime Detection in Tweets Using an Intelligent Dictionary.

Autor:	Abdalrdha, Zainab Khyioon, Al-Bakry, Abbas Mohsin, Farhan, Alaa K.
Předmět:	CRIMINAL investigation MACHINE learning NATURAL language processing DEEP learning ENCYCLOPEDIAS & dictionaries DRUG traffic LAW enforcement agencies USER-generated content
Zdroj:	Revue d'Intelligence Artificielle; Dec2023, Vol. 37 Issue 6, p1651-1661, 11p
Abstrakt:	As social media grows, recognizing and managing illicit content, including threats, harassment, hate speech, armed robbery, drug smuggling, blackmail, and other crimes, is crucial. The present study uses machine learning and deep learning to create an intelligent lexicon for identifying crime-related material in Twitter tweets. The Aho-Corasick technique effectively creates a dictionary for extensive text corpus search, categorization, and keyword-based action execution. This strategy overcomes the evolution of language dynamics and criminal vocabulary to improve crime-related information detection. This paper aims to gather accurate data to help law enforcement identify and prevent specific crimes. For tweet preprocessing and extraction of relevant information like textual patterns and other distinctive qualities, natural language processing (NLP) technologies are prioritized. The paper describes labeling tweets into crime categories. This dataset trains supervised learning models to categorize tweets as criminal or not. XGBoost and Hybrid CNN-LSTM are combined for this. The suggested technique is assessed using precision, recall, F1-Score, accuracy, and MAP accuracy measures. These metrics measure the model's crime-related tweet identification accuracy. The Arabic tweets dataset, encompassing 18493 tweets and 10 features, is utilized for model testing. After training, the Hybrid CNN-LSTM model demonstrated an accuracy of 99.84% and a macro F1-Score of 98.20%. When the XGBoost method was employed, the traditional machine learning model achieved a peak F1 macro score of 99.36% and a maximum accuracy of 100%. The results suggest that while the deep learning models outperform machine learning models in the F1-Score, XGBoost exhibits superior accuracy. The paper presents a comprehensive strategy for crime detection in tweets, potentially offering a significant tool for law enforcement agencies. [ABSTRACT FROM AUTHOR]
Databáze:	Complementary Index
Externí odkaz:	Zobrazit plný text záznamu