Phishing URL detection with neural networks: an empirical study

Autor:	Hayk Ghalechyan, Elina Israyelyan, Avag Arakelyan, Gerasim Hovhannisyan, Arman Davtyan
Jazyk:	angličtina
Rok vydání:	2024
Předmět:	Phishing URL Neural network Probabilistic neural networks Uncertainty Empirical data Medicine Science
Zdroj:	Scientific Reports, Vol 14, Iss 1, Pp 1-12 (2024)
Druh dokumentu:	article
ISSN:	2045-2322
DOI:	10.1038/s41598-024-74725-6
Popis:	Abstract Cybercriminals create phishing websites that mimic legitimate websites to get sensitive information from companies, individuals, or governments. Therefore, using state-of-the-art artificial intelligence and machine learning technologies to correctly classify phishing and legitimate URLs is imperative. We report the results of applying deterministic and probabilistic neural network models to URL classification. Key achievements of this work are: (1) The development of a unique approach based on probabilistic neural networks that improves classification accuracy. (2) We show for the first time in URL phishing research that a machine learning model trained on a combination of open source and private datasets is successful in production. The dataset is constructed from open sources like Alexa, PhishTank, or OpenPhish and, most importantly, real-world production data from EasyDMARC. The daily validation of the model using daily reported URL data and corresponding labels, both from open-source platforms and private production, reach on average a 97% accuracy on the validation dataset, labeled by PhishTank, OpenPhish and EasdDMARC where possible mislabeled data can not be excluded and was not possible to check due to large number of URLs. Feature engineering was done without third-party dependencies. Lastly, the evaluation of both deterministic and probabilistic models shows high accuracy on short and long URLs, where short URLs are defined as having less than 50 characters.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/427fb1a9d62842078cacd0e7b925cb2e Zobrazit plný text záznamu View record in DOAJ Plný text ve formátu PDF Plný text ve formátu HTML
Nepřihlášeným uživatelům se plný text nezobrazuje	K zobrazení výsledku je třeba se přihlásit.