An Improved Vulnerability Exploitation Prediction Model with Novel Cost Function and Custom Trained Word Vector Embedding

Autor: Norziana Jamil, Nowshad Amin, Kwok-Yan Lam, Mohammad Shamsul Hoque
Přispěvatelé: School of Computer Science and Engineering, Nanyang Technopreneurship Center
Rok vydání: 2021
Předmět:
Computer science
Vulnerability
Context (language use)
Cloud computing
02 engineering and technology
TP1-1185
Overfitting
Machine learning
computer.software_genre
Biochemistry
Article
Analytical Chemistry
Unique identifier
Machine Learning
Resource (project management)
0202 electrical engineering
electronic engineering
information engineering

Electrical and Electronic Engineering
Instrumentation
Cloud Security Management
supervised machine learning
modelling and prediction

Computer Security
business.industry
Chemical technology
National Vulnerability Database
Reproducibility of Results
020207 software engineering
Atomic and Molecular Physics
and Optics

vulnerability exploitation prediction
cost function
CVSS
Computer science and engineering [Engineering]
020201 artificial intelligence & image processing
Supervised Machine Learning
Artificial intelligence
cloud security management
Neural Networks
Computer

business
computer
Algorithms
Zdroj: Sensors, Vol 21, Iss 4220, p 4220 (2021)
Sensors (Basel, Switzerland)
Sensors
Volume 21
Issue 12
ISSN: 1424-8220
0105-1784
Popis: Successful cyber-attacks are caused by the exploitation of some vulnerabilities in the software and/or hardware that exist in systems deployed in premises or the cloud. Although hundreds of vulnerabilities are discovered every year, only a small fraction of them actually become exploited, thereby there exists a severe class imbalance between the number of exploited and non-exploited vulnerabilities. The open source national vulnerability database, the largest repository to index and maintain all known vulnerabilities, assigns a unique identifier to each vulnerability. Each registered vulnerability also gets a severity score based on the impact it might inflict upon if compromised. Recent research works showed that the cvss score is not the only factor to select a vulnerability for exploitation, and other attributes in the national vulnerability database can be effectively utilized as predictive feature to predict the most exploitable vulnerabilities. Since cybersecurity management is highly resource savvy, organizations such as cloud systems will benefit when the most likely exploitable vulnerabilities that exist in their system software or hardware can be predicted with as much accuracy and reliability as possible, to best utilize the available resources to fix those first. Various existing research works have developed vulnerability exploitation prediction models by addressing the existing class imbalance based on algorithmic and artificial data resampling techniques but still suffer greatly from the overfitting problem to the major class rendering them practically unreliable. In this research, we have designed a novel cost function feature to address the existing class imbalance. We also have utilized the available large text corpus in the extracted dataset to develop a custom-trained word vector that can better capture the context of the local text data for utilization as an embedded layer in neural networks. Our developed vulnerability exploitation prediction models powered by a novel cost function and custom-trained word vector have achieved very high overall performance metrics for accuracy, precision, recall, F1-Score and AUC score with values of 0.92, 0.89, 0.98, 0.94 and 0.97, respectively, thereby outperforming any existing models while successfully overcoming the existing overfitting problem for class imbalance. Published version This research is supported by BOLD Publication Fund 2021, Yayasan Canselor Uniten (YCU) Grant with a project code RJ010517844/06, and partly supported by TNB Seed Fund 2019-2020 with a project code U-TC-RD-19-09. We also thank ICT Ministry, Bangladesh for its support.
Databáze: OpenAIRE