Tackling Feature Selection Problems with Genetic Algorithms in Software Defect Prediction for Optimization

Autor: Yandra Arkeman, Irman Hermadi, Rizal Broer Bahaweres, Arif Imam Suroso, Alam Wahyu Hutomo, Indra Permana Solihin
Rok vydání: 2020
Předmět:
Zdroj: 2020 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS).
DOI: 10.1109/icimcis51567.2020.9354282
Popis: Software defect prediction is a way to improve quality by finding and tracking defective modules in the software which helps reduce costs during the software testing process. The use of machine learning methods for predicting software defects can be applied to predict defects in each software module. However, basically the software defect prediction dataset has two problems, namely class imbalance with very few defective modules compared to non-defective modules and contains noisy attributes due to irrelevant features. With these two problems, it will result in overfitting and lead to biased classification results so that it will have an impact on significantly reducing the performance of the machine learning model. In this study, we propose the implementation of bagging techniques and genetic algorithms to improve the classification performance of machine learning models in predicting software defects based Logistic Regression, Naive Bayes, SVM, KNN, Decision Tree. Bagging techniques and Genetic algorithms are approaches that can handle two main problems in software defects prediction, each of which can handle the class imbalance and feature selection problem. We used 6 NASA Promise datasets to evaluate the classification performance results based on AUC and G-Means values. The results using 10 cross-validations show that the proposed method can improve classification performance when compared to the original algorithm. The Decision Tree shows the highest performance of the 3 datasets tested, with the highest value of 94.61 % on the KC4 dataset. We also compare GA performance with another natural algorithm, Particle Swarm Optimization (PSO). The results show that the performance of all machine learning models with GA can outperform the algorithms with PSO
Databáze: OpenAIRE