ALTRA: Cross-Project Software Defect Prediction via Active Learning and Tradaboost

Autor: Zhidan Yuan, Xiang Chen, Zhanqi Cui, Yanzhou Mu
Jazyk: angličtina
Rok vydání: 2020
Předmět:
Zdroj: IEEE Access, Vol 8, Pp 30037-30049 (2020)
Druh dokumentu: article
ISSN: 2169-3536
DOI: 10.1109/ACCESS.2020.2972644
Popis: Cross-project defect prediction (CPDP) methods can be used when the target project is a new project or lacks enough labeled program modules. In these new target projects, we can easily extract and then measure these modules with software measurement tools. However, labeling these program modules is time-consuming, error-prone and requires professional domain knowledge. Moreover, directly using labeled modules in the other projects (i.e., the source projects) can not achieve satisfactory performance due to the large data distribution difference in most cases. In this article, to our best knowledge, we are the first to propose a novel method ALTRA, which can utilize both active learning and TrAdaBoost to alleviate this issue. In particular, we firstly use Burak filter to select similar labeled modules from the source project after analyzing the unlabeled modules in the target project. Then we use active learning to choose representative unlabeled modules from the target project and ask experts to label the type (i.e., defective or non-defective) of these modules. Later, we use TrAdaBoost to determine the weights of labeled modules in the source project and the target project, and then construct the model via weighted support vector machine. After selecting a small number of modules (i.e., only 5% modules) in the target project, we terminate the method ALTRA and return the final constructed model. To show the effectiveness of our proposed method ALTRA, we choose 10 large-scale open-source projects from different application domains. In terms of both F1 and AUC performance indicators, we find ALTRA can perform significantly better than seven state-of-the-art CPDP baselines. Moreover, we also show that the usage of Burak filter, the uncertainty active learning strategy, the class imbalanced learning method and TrAdaBoost are competitive in our proposed method ALTRA.
Databáze: Directory of Open Access Journals