Classification framework for faulty-software using enhanced exploratory whale optimizer-based feature selection scheme and random forest ensemble learning.

Autor: Mafarja M; Department of Computer Science, Birzeit University, Birzeit, Palestine., Thaher T; Department of Computer Systems Engineering, Arab American University, Jenin, Palestine.; Information Technology Engineering, Al-Quds University, Abu Dies, Jerusalem, Palestine., Al-Betar MA; Artificial Intelligence Research Center (AIRC), College of Engineering and Information Technology, Ajman University, Ajman, United Arab EmiratesDeepSinghML2017, Irbid, Jordan., Too J; Faculty of Electrical Engineering, Universiti Teknikal Malaysia Melaka, Hang Tuah Jaya, 76100 Durian Tunggal Melaka, Malaysia., Awadallah MA; Department of Computer Science, Al-Aqsa University, P.O. Box 4051, Gaza, Palestine.; Artificial Intelligence Research Center (AIRC), Ajman University, Ajman, United Arab Emirates., Abu Doush I; Department of Computing, College of Engineering and Applied Sciences, American University of Kuwait, Salmiya, Kuwait.; Computer Science Department, Yarmouk University, Irbid, Jordan., Turabieh H; Department of Health Management and Informatics, University of Missouri, Columbia, 5 Hospital Drive, Columbia, MO 65212 USA.
Jazyk: angličtina
Zdroj: Applied intelligence (Dordrecht, Netherlands) [Appl Intell (Dordr)] 2023 Feb 09, pp. 1-43. Date of Electronic Publication: 2023 Feb 09.
DOI: 10.1007/s10489-022-04427-x
Abstrakt: Software Fault Prediction (SFP) is an important process to detect the faulty components of the software to detect faulty classes or faulty modules early in the software development life cycle. In this paper, a machine learning framework is proposed for SFP. Initially, pre-processing and re-sampling techniques are applied to make the SFP datasets ready to be used by ML techniques. Thereafter seven classifiers are compared, namely K-Nearest Neighbors (KNN), Naive Bayes (NB), Linear Discriminant Analysis (LDA), Linear Regression (LR), Decision Tree (DT), Support Vector Machine (SVM), and Random Forest (RF). The RF classifier outperforms all other classifiers in terms of eliminating irrelevant/redundant features. The performance of RF is improved further using a dimensionality reduction method called binary whale optimization algorithm (BWOA) to eliminate the irrelevant/redundant features. Finally, the performance of BWOA is enhanced by hybridizing the exploration strategies of the grey wolf optimizer (GWO) and harris hawks optimization (HHO) algorithms. The proposed method is called SBEWOA. The SFP datasets utilized are selected from the PROMISE repository using sixteen datasets for software projects with different sizes and complexity. The comparative evaluation against nine well-established feature selection methods proves that the proposed SBEWOA is able to significantly produce competitively superior results for several instances of the evaluated dataset. The algorithms' performance is compared in terms of accuracy, the number of features, and fitness function. This is also proved by the 2-tailed P-values of the Wilcoxon signed ranks statistical test used. In conclusion, the proposed method is an efficient alternative ML method for SFP that can be used for similar problems in the software engineering domain.
Competing Interests: Conflict of InterestsThe authors declare that they have no conflict of interest.
(© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023, Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.)
Databáze: MEDLINE