Software Fault Prediction Using Optimal Classifier Selection: An Ensemble Approach.

Autor: Agrawalla, Bikash, Reddy, B Ramachandra
Předmět:
Zdroj: Procedia Computer Science; 2024, Vol. 235, p2965-2974, 10p
Abstrakt: Fault prediction is the process of using data analysis and machine learning models to anticipate potential defects or faults in the software system. Using only the base machine learning models for software fault prediction leads to limited performance, difficulty in handling non-linear relationships and imbalanced data, inadequate feature representation, and limited complexity handling. Hence, in order to overcome these challenges, this paper proposes a new technique for the selection of classifiers that forms a heterogeneous ensemble. The main goal is to remove or trim out the classifiers that show poor performance compared to the other base classifiers, which can result into a more effective ensemble and can produce better results. The algorithm proposed in this paper finds a set of classifiers that can perform better than using all the classifiers. The challenge that was faced was how to identify the poor-performing classifiers. This challenge is dealt with by performing an experiment using different threshold values to choose the trimmed set of classifiers. For evaluation of the proposed model, 8 different benchmark software fault datasets were used, which are taken from PROMISE and the Apache repository, and AUC is used as the performance measure. The results obtained after the experimental analysis demonstrate the effectiveness of our algorithm compared to the traditional approaches, which used all the base classifiers. There is a significant increase in the AUC values for 6 datasets out of 8, while using the average of probabilities and majority voting, it was seen that there is improvement in 7 out of 8 datasets used. The best-performing dataset by using the average of probabilities is ARC, where the AUC values increase from 0.6505 to 0.694, and while using majority voting, the best-performing dataset is XALAN, where the AUC values increase from 0.5455 to 0.679. From this, it can be seen that the proposed ensemble approach achieved higher AUC values for the tested datasets when compared to the base machine learning classifiers. [ABSTRACT FROM AUTHOR]
Databáze: Supplemental Index