An Adaptive Behavioral-Based Incremental Batch Learning Malware Variants Detection Model Using Concept Drift Detection and Sequential Deep Learning

Autor:	Jemal H. Abawajy, Sultan Alanazi, Afrah Y. AL-Rezami, Fuad A. Ghaleb, Asma A. Alhashmi, Abdulbasit Darem
Rok vydání:	2021
Předmět:	Software_OPERATINGSYSTEMS General Computer Science Concept drift Computer science Feature extraction adaptive incremental batch learning computer.software_genre Machine learning Classifier (linguistics) statistical process control General Materials Science Application programming interface business.industry Deep learning General Engineering deep learning Static analysis Statistical process control TK1-9971 ComputingMilieux_MANAGEMENTOFCOMPUTINGANDINFORMATIONSYSTEMS Malware variant detection Malware concept drift detection Electrical engineering. Electronics. Nuclear engineering Artificial intelligence business computer
Zdroj:	IEEE Access, Vol 9, Pp 97180-97196 (2021)
ISSN:	2169-3536
DOI:	10.1109/access.2021.3093366
Popis:	Malware variants are the major emerging threats that face cybersecurity due to the potential damage to computer systems. Many solutions have been proposed for detecting malware variants. However, accurate detection is challenging due to the constantly evolving nature of the malware variants that cause concept drift. Existing malware detection solutions assume that the mapping learned from historical malware features will be valid for new and future malware. The relationship between input features and the class label has been considered stationary, which doesn’t hold for the ever-evolving nature of malware variants. Malware features change dynamically due to code obfuscations, mutations, and the modification made by malware authors to change the features’ distribution and thus evade the detection rendering the detection model obsolete and ineffective. This study presents an Adaptive behavioral-based Incremental Batch Learning Malware Variants Detection model using concept drift detection and sequential deep learning (AIBL-MVD) to accommodate the new malware variants. Malware behaviors were extracted using dynamic analysis by running the malware files in a sandbox environment and collecting their Application Programming Interface (API) traces. According to the malware first-time appearance, the malware samples were sorted to capture the malware variants’ change characteristics. The base classifier was then trained based on a subset of historical malware samples using a sequential deep learning model. The new malware samples were mixed with a subset of old data and gradually introduced to the learning model in an adaptive batch size incremental learning manner to address the catastrophic forgetting dilemma of incremental learning. The statistical process control technique has been used to detect the concept drift as an indication for incrementally updating the model as well as reducing the frequency of model updates. Results from extensive experiments show that the proposed model is superior in terms of detection rate and efficiency compared with the static model, periodic retraining approaches, and the fixed batch size incremental learning approach. The model maintains an average of 99.41% detection accuracy of new and variants malware with a low updating frequency of 1.35 times per month.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::6e6f362f68a9c9d952ae744ac2f4c722 https://doi.org/10.1109/access.2021.3093366 Zobrazit plný text záznamu