Popis: |
While an ever-growing number of vulnerabilities are reported every day, all the reported vulnerabilities are not all the same, as some are more targeted than others. Correctly estimating the likelihood of a vulnerability being exploited is a critical task for system administrators in cyber security management as it can help them prioritise and patch the right vulnerabilities. However, due to three key issues: unavailability of labeled training data, high dimensionality, and class imbalance in datasets, the prediction of vulnerabilities can be a challenging problem in practice, especially as the existing methods often only prioritise one issue at a time. In this paper, we propose a method called OutCenTR that can predict the likelihood of a vulnerability being exploited by addressing all the issues concurrently. In OutCenTR, the unavailability of labeled training data is addressed by considering a semi-supervised approach which requires only a few labeled data, the high dimensionality issue is addressed by identifying and removing insignificant features, and the class imbalance issue is addressed by introducing context-based distinguishability scores between records. OutCenTR first determines important features in datasets and then makes use of an existing algorithm to build a classifier by considering only the important features. The classifier is not only effective for predicting exploit of vulnerabilities but also valuable for general-purpose outlier detection. We evaluate the effectiveness of OutCenTR by comparing its performance with the performance of five state-of-the-art methods on four publicly available datasets and twelve synthetic datasets. The methods are evaluated in terms of five criteria, namely: Precision, Recall, F1 Score, ROC, and Execution time. Our initial experimental results clearly indicate that the proposed method, OutCenTR outperforms existing methods. |