DNNAttention: A deep neural network and attention based architecture for cross project defect number prediction

Autor: Anil Kumar Tripathi, Sushant Kumar Pandey
Rok vydání: 2021
Předmět:
Zdroj: Knowledge-Based Systems. 233:107541
ISSN: 0950-7051
DOI: 10.1016/j.knosys.2021.107541
Popis: Software defect prediction (SDP) is the process of detecting fault-prone classes or modules in a software system. It helps in allocating resources before the testing phase more optimally. Due to a lack of an adequate dataset, defects can be predicted by employing data from different projects to train the classifier called cross-project defect prediction (CPDP). Cross-project defect number prediction (CPDNP) is one step ahead of CPDP, in which we can also estimate the number of defects in each module of a software system; we contemplate it as a regression problem. This article dealt with the CPDNP mechanism and suggested a CPDNP architecture by employing a deep neural network and attention layer called DNNAttention. We syntheses substantial data named cross-heap by utilizing an amalgamation of 44 projects from the PROMISE repository. We fed the cross-heap into DNNAttention to train and evaluate the performance over 44 datasets by applying transfer learning. We have also address class imbalance (CI) and overfitting problems by employing multi-label random over-sampling and dropout regularization, respectively. We compared the performance of the DNNAttention using mean squared error (MSE), mean absolute error (MAE), and accuracy over eight baseline methods. We found out of 44 projects, 19 and 20 have minimum MSE and MAE, respectively, and in 19 projects, accuracy yields by the proposed model surpasses exiting techniques. We also compared the performance in terms of Kendall and Fault-Percentile-Average with the recent unsupervised method and found DNNAttention significantly outperforms this method. Moreover, we found the improvement of the DNNAttention over other baseline methods in terms of MAE, MSE, and accuracy by inspecting 20% line of code are substantial. In most situations, the improvements are significant, and they have a large effect size across all 44 projects.
Databáze: OpenAIRE