Detecting cybersecurity attacks across different network features and learners

Autor: Joffrey L. Leevy, John Hancock, Richard Zuech, Taghi M. Khoshgoftaar
Jazyk: angličtina
Rok vydání: 2021
Předmět:
Zdroj: Journal of Big Data, Vol 8, Iss 1, Pp 1-29 (2021)
Druh dokumentu: article
ISSN: 2196-1115
DOI: 10.1186/s40537-021-00426-w
Popis: Abstract Machine learning algorithms efficiently trained on intrusion detection datasets can detect network traffic capable of jeopardizing an information system. In this study, we use the CSE-CIC-IDS2018 dataset to investigate ensemble feature selection on the performance of seven classifiers. CSE-CIC-IDS2018 is big data (about 16,000,000 instances), publicly available, modern, and covers a wide range of realistic attack types. Our contribution is centered around answers to three research questions. The first question is, “Does feature selection impact performance of classifiers in terms of Area Under the Receiver Operating Characteristic Curve (AUC) and F1-score?” The second question is, “Does including the Destination_Port categorical feature significantly impact performance of LightGBM and Catboost in terms of AUC and F1-score?” The third question is, “Does the choice of classifier: Decision Tree (DT), Random Forest (RF), Naive Bayes (NB), Logistic Regression (LR), Catboost, LightGBM, or XGBoost, significantly impact performance in terms of AUC and F1-score?” These research questions are all answered in the affirmative and provide valuable, practical information for the development of an efficient intrusion detection model. To the best of our knowledge, we are the first to use an ensemble feature selection technique with the CSE-CIC-IDS2018 dataset.
Databáze: Directory of Open Access Journals