Network intrusion detection using oversampling technique and machine learning algorithms.
Autor: | Ahmed HA; Department of Computer Science and Software Engineering, Jinnah University for Women, Karachi, Sindh, Pakistan., Hameed A; Department of Computer Science and Software Engineering, Jinnah University for Women, Karachi, Sindh, Pakistan., Bawany NZ; Department of Computer Science and Software Engineering, Jinnah University for Women, Karachi, Sindh, Pakistan. |
---|---|
Jazyk: | angličtina |
Zdroj: | PeerJ. Computer science [PeerJ Comput Sci] 2022 Jan 07; Vol. 8, pp. e820. Date of Electronic Publication: 2022 Jan 07 (Print Publication: 2022). |
DOI: | 10.7717/peerj-cs.820 |
Abstrakt: | The expeditious growth of the World Wide Web and the rampant flow of network traffic have resulted in a continuous increase of network security threats. Cyber attackers seek to exploit vulnerabilities in network architecture to steal valuable information or disrupt computer resources. Network Intrusion Detection System (NIDS) is used to effectively detect various attacks, thus providing timely protection to network resources from these attacks. To implement NIDS, a stream of supervised and unsupervised machine learning approaches is applied to detect irregularities in network traffic and to address network security issues. Such NIDSs are trained using various datasets that include attack traces. However, due to the advancement in modern-day attacks, these systems are unable to detect the emerging threats. Therefore, NIDS needs to be trained and developed with a modern comprehensive dataset which contains contemporary common and attack activities. This paper presents a framework in which different machine learning classification schemes are employed to detect various types of network attack categories. Five machine learning algorithms: Random Forest, Decision Tree, Logistic Regression, K-Nearest Neighbors and Artificial Neural Networks, are used for attack detection. This study uses a dataset published by the University of New South Wales (UNSW-NB15), a relatively new dataset that contains a large amount of network traffic data with nine categories of network attacks. The results show that the classification models achieved the highest accuracy of 89.29% by applying the Random Forest algorithm. Further improvement in the accuracy of classification models is observed when Synthetic Minority Oversampling Technique (SMOTE) is applied to address the class imbalance problem. After applying the SMOTE, the Random Forest classifier showed an accuracy of 95.1% with 24 selected features from the Principal Component Analysis method. Competing Interests: The authors declare that they have no competing interests. (© 2022 Ahmed et al.) |
Databáze: | MEDLINE |
Externí odkaz: |