Multi-Language Spam/Phishing Classification by Email Body Text: Toward Automated Security Incident Investigation

Autor:	Ivan Suzdalev, Kornelija Tunaitytė, Antanas Čenys, Justinas Janulevičius, Simona Ramanauskaitė, Justinas Rastenis
Jazyk:	angličtina
Rok vydání:	2021
Předmět:	Prioritization Computer Networks and Communications Computer science lcsh:TK7800-8360 02 engineering and technology phishing multi-language emails Email classification spam 0202 electrical engineering electronic engineering information engineering Multi language Electrical and Electronic Engineering Information retrieval InformationSystems_INFORMATIONSYSTEMSAPPLICATIONS lcsh:Electronics 020207 software engineering Lithuanian augmented dataset Phishing language.human_language Spamming Body text ComputingMethodologies_PATTERNRECOGNITION classification Hardware and Architecture Control and Systems Engineering Signal Processing language 020201 artificial intelligence & image processing
Zdroj:	Electronics Volume 10 Issue 6 Electronics, Vol 10, Iss 668, p 668 (2021)
ISSN:	2079-9292
DOI:	10.3390/electronics10060668
Popis:	Spamming and phishing are two types of emailing that are annoying and unwanted, differing by the potential threat and impact to the user. Automated classification of these categories can increase the users’ awareness as well as to be used for incident investigation prioritization or automated fact gathering. However, currently there are no scientific papers focusing on email classification concerning these two categories of spam and phishing emails. Therefore this paper presents a solution, based on email message body text automated classification into spam and phishing emails. We apply the proposed solution for email classification, written in three languages: English, Russian, and Lithuanian. As most public email datasets almost exclusively collect English emails, we investigate the suitability of automated dataset translation to adapt it to email classification, written in other languages. Experiments on public dataset usage limitations for a specific organization are executed in this paper to evaluate the need of dataset updates for more accurate classification results.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::42a5ac32d2897909c95a98fcf4b0beb9 Zobrazit plný text záznamu