Evading obscure communication from spam emails

Autor: Khan Farhan Rafat, Qin Xin, Abdul Rehman Javed, Zunera Jalil, Rana Zeeshan Ahmad
Jazyk: angličtina
Rok vydání: 2022
Předmět:
Zdroj: Mathematical Biosciences and Engineering, Vol 19, Iss 2, Pp 1926-1943 (2022)
Druh dokumentu: article
ISSN: 1551-0018
DOI: 10.3934/mbe.2022091?viewType=HTML
Popis: Spam is any form of annoying and unsought digital communication sent in bulk and may contain offensive content feasting viruses and cyber-attacks. The voluminous increase in spam has necessitated developing more reliable and vigorous artificial intelligence-based anti-spam filters. Besides text, an email sometimes contains multimedia content such as audio, video, and images. However, text-centric email spam filtering employing text classification techniques remains today's preferred choice. In this paper, we show that text pre-processing techniques nullify the detection of malicious contents in an obscure communication framework. We use Spamassassin corpus with and without text pre-processing and examined it using machine learning (ML) and deep learning (DL) algorithms to classify these as ham or spam emails. The proposed DL-based approach consistently outperforms ML models. In the first stage, using pre-processing techniques, the long-short-term memory (LSTM) model achieves the highest results of 93.46% precision, 96.81% recall, and 95% F1-score. In the second stage, without using pre-processing techniques, LSTM achieves the best results of 95.26% precision, 97.18% recall, and 96% F1-score. Results show the supremacy of DL algorithms over the standard ones in filtering spam. However, the effects are unsatisfactory for detecting encrypted communication for both forms of ML algorithms.
Databáze: Directory of Open Access Journals