Classification with reject option for software defect prediction

Autor: Diego P. P. Mesquita, Lincoln S. Rocha, Joo Paulo P. Gomes, Ajalmar R. da Rocha Neto
Rok vydání: 2016
Předmět:
Zdroj: Applied Soft Computing. 49:1085-1093
ISSN: 1568-4946
DOI: 10.1016/j.asoc.2016.06.023
Popis: Graphical abstractDisplay Omitted HighlightsWe propose the use of classification with reject option for software defect prediction (SDP) as a way to incorporate additional knowledge in the SDP process.We propose two variants of the extreme learning machine with reject option.It is proposed an ELM with reject option for imbalanced datasets.The proposed method is tested on five real world software datasets.An example is shown to illustrate how the rejected software modules can be further analyzed to improve the final SDP accuracy. ContextSoftware defect prediction (SDP) is an important task in software engineering. Along with estimating the number of defects remaining in software systems and discovering defect associations, classifying the defect-proneness of software modules plays an important role in software defect prediction. Several machine-learning methods have been applied to handle the defect-proneness of software modules as a classification problem. This type of yes or no decision is an important drawback in the decision-making process and if not precise may lead to misclassifications. To the best of our knowledge, existing approaches rely on fully automated module classification and do not provide a way to incorporate extra knowledge during the classification process. This knowledge can be helpful in avoiding misclassifications in cases where system modules cannot be classified in a reliable way. ObjectiveWe seek to develop a SDP method that (i) incorporates a reject option in the classifier to improve the reliability in the decision-making process; and (ii) makes it possible postpone the final decision related to rejected modules for an expert analysis or even for another classifier using extra domain knowledge. MethodWe develop a SDP method called rejoELM and its variant, IrejoELM. Both methods were built upon the weighted extreme learning machine (ELM) with reject option that makes it possible postpone the final decision of non-classified modules, the rejected ones, to another moment. While rejoELM aims to maximize the accuracy for a rejection rate, IrejoELM maximizes the F-measure. Hence, IrejoELM becomes an alternative for classification with reject option for imbalanced datasets. ResultsrejoEM and IrejoELM are tested on five datasets of source code metrics extracted from real world open-source software projects. Results indicate that rejoELM has an accuracy for several rejection rates that is comparable to some state-of-the-art classifiers with reject option. Although IrejoELM shows lower accuracies for several rejection rates, it clearly outperforms all other methods when the F-measure is used as a performance metric. ConclusionIt is concluded that rejoELM is a valid alternative for classification with reject option problems when classes are nearly equally represented. On the other hand, IrejoELM is shown to be the best alternative for classification with reject option on imbalanced datasets. Since SDP problems are usually characterized as imbalanced learning problems, the use of IrejoELM is recommended.
Databáze: OpenAIRE