BPDET: An effective software bug prediction model using deep representation and ensemble learning techniques

Autor:	Anil Kumar Tripathi, Ravi Bhushan Mishra, Sushant Kumar Pandey
Rok vydání:	2020
Předmět:	0209 industrial biotechnology Wilcoxon signed-rank test business.industry Computer science Deep learning General Engineering Feature selection 02 engineering and technology Machine learning computer.software_genre Missing data Ensemble learning Software metric Computer Science Applications 020901 industrial engineering & automation Software bug Artificial Intelligence 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Artificial intelligence business computer Feature learning
Zdroj:	Expert Systems with Applications. 144:113085
ISSN:	0957-4174
DOI:	10.1016/j.eswa.2019.113085
Popis:	In software fault prediction systems, there are many hindrances for detecting faulty modules, such as missing values or samples, data redundancy, irrelevance features, and correlation. Many researchers have built a software bug prediction (SBP) model, which classify faulty and non-faulty module which are associated with software metrics. Till now very few works has been done which addresses the class imbalance problem in SBP. The main objective of this paper is to reveal the favorable result by feature selection and machine learning methods to detect defective and non-defective software modules. We propose a rudimentary classification based framework Bug Prediction using Deep representation and Ensemble learning (BPDET) techniques for SBP. It combinedly applies by ensemble learning (EL) and deep representation(DR). The software metrics which are used for SBP are mostly conventional. Staked denoising auto-encoder (SDA) is used for the deep representation of software metrics, which is a robust feature learning method. Propose model is mainly divided into two stages: deep learning stage and two layers of EL stage (TEL). The extraction of the feature from SDA in the very first step of the model then applied TEL in the second stage. TEL is also dealing with the class imbalance problem. The experiment mainly performed NASA (12) datasets, to reveal the efficiency of DR, SDA, and TEL. The performance is analyzed in terms of Mathew co-relation coefficient (MCC), the area under the curve (AUC), precision-recall area (PRC), F-measure and Time. Out of 12 dataset MCC values over 11 datasets, ROC values over 6 datasets, PRC values overall 12 datasets and F-measure over 8 datasets surpass the existing state of the art bug prediction methods. We have tested BPDET using Wilcoxon rank sum test which rejects the null hypothesis at α = 0.025. We have also tested the stability of the model over 5, 8, 10, 12, and 15 fold cross-validation and got similar results. Finally, we conclude that BPDET is a stable and outperformed on most of the datasets compared with EL and another state of the art techniques.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::1c30113452295ede6c2570fc06e75542 https://doi.org/10.1016/j.eswa.2019.113085 Zobrazit plný text záznamu Full Text from ScienceDirect