SHAP Interpretations of Tree and Neural Network DNS Classifiers for Analyzing DGA Family Characteristics

Autor:	Nikos Kostopoulos, Dimitris Kalogeras, Dimitris Pantazatos, Maria Grammatikou, Vasilis Maglaris
Jazyk:	angličtina
Rok vydání:	2023
Předmět:	Cybersecurity domain generation algorithms (DGA's) domain name system (DNS) explainable artificial intelligence (XAI) machine learning shapley additive explanation (SHAP) Electrical engineering. Electronics. Nuclear engineering TK1-9971
Zdroj:	IEEE Access, Vol 11, Pp 61144-61160 (2023)
Druh dokumentu:	article
ISSN:	2169-3536
DOI:	10.1109/ACCESS.2023.3286313
Popis:	Domain Generation Algorithms (DGA’s) have been employed by botnet orchestrators for controlling infected hosts (bots), while evading detection by performing multiple DNS requests, mostly for non-existing domain names. With blacklists ineffective, modern DGA filtering methods rely on Machine Learning (ML). Emerging needs for higher intrusion detection accuracy lead to complex, non-interpretable black-box classifiers, thus requiring eXplainable Artificial Intelligence (XAI) techniques. In this paper, we utilize SHapley Additive exPlanation (SHAP) to derive model-agnostic, post-hoc interpretations on DGA name classifiers. This method is applied to binary supervised tree-based classifiers (e.g. eXtreme Gradient Boosting - XGBoost) and deep neural networks (Multi-Layer Perceptron - MLP) to assess domain name feature importance. SHAP visualization tools (summary, dependence, force plots) are used to rank features, investigate their effect on model decisions and determine their interactions. Specific interpretations are detailed for identifying names belonging to common DGA families pertaining to arithmetic, wordlist, hash and permutation based schemes. Learning and interpretations are based on up-to-date datasets, such as Tranco for benign and DGArchive for malicious names. Domain name features are extracted from dataset instances, thus limiting time-consuming and privacy-invasive database operations on historical data. Our experimental results demonstrate that SHAP enables explanations of XGBoost (the most accurate tree-based model) and MLP classifiers and indicates the characteristics of specific DGA schemes, commonly employed in attacks. In conclusion, we envision that XAI methods will expedite ML deployment in networking environments where justifications for black-box models are required.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/f0b8e316cc3c4ae8807983cada45fb47 Zobrazit plný text záznamu View record in DOAJ