SFCWGAN-BiTCN with Sequential Features for Malware Detection
Autor: | Bona Xuan, Jin Li, Yafei Song |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2023 |
Předmět: |
malware classification
selection feature conditional Wasserstein generative adversarial network bidirectional temporal convolutional network whale optimization algorithm extreme gradient boosting Technology Engineering (General). Civil engineering (General) TA1-2040 Biology (General) QH301-705.5 Physics QC1-999 Chemistry QD1-999 |
Zdroj: | Applied Sciences, Vol 13, Iss 4, p 2079 (2023) |
Druh dokumentu: | article |
ISSN: | 2076-3417 |
DOI: | 10.3390/app13042079 |
Popis: | In the field of adversarial attacks, the generative adversarial network (GAN) has shown better performance. There have been few studies applying it to malware sample supplementation, due to the complexity of handling discrete data. More importantly, unbalanced malware family samples interfere with the analytical power of malware detection models and mislead malware classification. To address the problem of the impact of malware family imbalance on accuracy, a selection feature conditional Wasserstein generative adversarial network (SFCWGAN) and bidirectional temporal convolutional network (BiTCN) are proposed. First, we extract the features of malware Opcode and API sequences and use Word2Vec to represent features, emphasizing the semantic logic between API tuning and Opcode calling sequences. Second, the Spearman correlation coefficient and the whale optimization algorithm extreme gradient boosting (WOA-XGBoost) algorithm are combined to select features, filter out invalid features, and simplify structure. Finally, we propose a GAN-based sequence feature generation algorithm. Samples were generated using the conditional Wasserstein generative adversarial network (CWGAN) on the imbalanced malware family dataset, added to the trainset to supplement the samples, and trained on BiTCN. In comparison, in tests on the Kaggle and DataCon datasets, the model achieved detection accuracies of 99.56% and 96.93%, respectively, which were 0.18% and 2.98% higher than the models of other methods. |
Databáze: | Directory of Open Access Journals |
Externí odkaz: |