Detecting Ambiguous Phishing Certificates using Machine Learning

Autor: Sajad Homayoun, Kaspar Hageman, Sam Afzal-Houshmand, Christian D. Jensen, Jens M. Pedersen
Jazyk: angličtina
Rok vydání: 2022
Předmět:
Zdroj: Hageman, K, Homayoun, S, Afzal-Houshmand, S, Jensen, C D & Pedersen, J M 2022, Detecting Ambiguous Phishing Certificates using Machine Learning . in Proceedings of 36 th International Conference on Information Networking . IEEE, 36 th International Conference on Information Networking, Jeju Island, Korea, Republic of, 12/01/2022 .
Homayoun, S, Hageman, K D, Afzal-Houshmand, S, Jensen, C D & Pedersen, J M 2022, Detecting Ambiguous Phishing Certificates using Machine Learning . in 2022 International Conference on Information Networking (ICOIN) . IEEE, pp. 1-6, 2022 International Conference on Information Networking (ICOIN), Jeju-si, Korea, Republic of, 12/01/2022 . https://doi.org/10.1109/ICOIN53446.2022.9687264
DOI: 10.1109/ICOIN53446.2022.9687264
Popis: Recent phishing attacks have started to migrate to HTTP over TLS (HTTPS), making a phishing web page appear safe to the user's browser despite its malicious purpose. This paper benefits from both digital certificates and domains related data features to propose machine learning-based solutions to predict digital certificates involved in HTTPS as phishing or benign certificates. In contrast to previous works that consider this a binary classification problem, we take into account that a certificate can be partially benign and phishy simultaneously. We propose a multi-class classifier and a regressor to classify these ambiguous certificates, in addition to benign and phishing certificates, where the 'phishyness' of a certificate is expressed as a value between 0 and 1 for the regressor. We apply our method to a set of certificates obtained from certificate transparency logs and show that we can classify them with high performance. We extend our validation by evaluating the performance of the model over time, showing that our model generalizes over time on our training data set.
Databáze: OpenAIRE