A cluster-analysis-based feature-selection method for software defect prediction

Autor: Shulong Liu, Xiang Chen, Daoxu Chen, Qing Gu, Wangshu Liu
Rok vydání: 2016
Předmět:
Zdroj: SCIENTIA SINICA Informationis. 46:1298-1320
ISSN: 1674-7267
DOI: 10.1360/n112015-00276
Popis: By mining historical software repositories, software defect prediction can construct defect-prediction models to predict potentially faulty modules in projects under testing. However, redundant and irrelevant features in the gathered datasets may influence the effectiveness of existing methods. A novel cluster-analysis-based feature-selection method (FECAR) is proposed. In particular, the original features are first clustered, based on a specific feature correlation (i.e., FFC) measure. Then, for each cluster, features are ranked based on a specific feature and class relevance (i.e., FCR) measure and a given number of features are chosen. In empirical studies, we chose symmetric uncertainty as the FFC measure, and information gain, chi-square, or ReliefF as the FCR measures. Based on some real-world projects, such as Eclipse and NASA, we focus on the prediction performance after using FECAR, and analyze the redundancy rate and selection proportion of the selected feature subset. The final results show the effectiveness of FECAR.
Databáze: OpenAIRE