Stability of Software Defect Prediction in Relation to Levels of Data Imbalance

Autor: Tihana Galinac Grbac, Mauša, G., Dalbelo-Bašić, B.
Přispěvatelé: Budimac, Zoran
Jazyk: angličtina
Rok vydání: 2013
Předmět:
Zdroj: Scopus-Elsevier
Popis: Software defect prediction is an important decision support activity in software quality assurance. Its goal is reducing verification costs by predicting the system modules that are more likely to contain defects, thus enabling more efficient allocation of resources in verification process. The problem is that there is no widely applicable well performing prediction method. The main reason is in the very nature of software datasets, their imbalance, complexity and properties dependent on the application domain. In this paper we suggest a research strategy for the study of the performance stability using different machine learning methods over different levels of imbalance for software defect prediction datasets. We also provide a preliminary case study on a dataset from the NASA MDP open repository using multivariate binary logistic regression and forward and backward feature selection. Results indicate that the performance becomes unstable around 80% of imbalance.
Databáze: OpenAIRE