A cluster-analysis-based feature-selection method for software defect prediction
Autor: | Shulong Liu, Xiang Chen, Daoxu Chen, Qing Gu, Wangshu Liu |
---|---|
Rok vydání: | 2016 |
Předmět: |
Engineering
General Computer Science business.industry Feature selection computer.software_genre Software Software bug Software quality assurance Feature (computer vision) Redundancy (engineering) Relevance (information retrieval) Data mining business Engineering (miscellaneous) computer Selection (genetic algorithm) |
Zdroj: | SCIENTIA SINICA Informationis. 46:1298-1320 |
ISSN: | 1674-7267 |
DOI: | 10.1360/n112015-00276 |
Popis: | By mining historical software repositories, software defect prediction can construct defect-prediction models to predict potentially faulty modules in projects under testing. However, redundant and irrelevant features in the gathered datasets may influence the effectiveness of existing methods. A novel cluster-analysis-based feature-selection method (FECAR) is proposed. In particular, the original features are first clustered, based on a specific feature correlation (i.e., FFC) measure. Then, for each cluster, features are ranked based on a specific feature and class relevance (i.e., FCR) measure and a given number of features are chosen. In empirical studies, we chose symmetric uncertainty as the FFC measure, and information gain, chi-square, or ReliefF as the FCR measures. Based on some real-world projects, such as Eclipse and NASA, we focus on the prediction performance after using FECAR, and analyze the redundancy rate and selection proportion of the selected feature subset. The final results show the effectiveness of FECAR. |
Databáze: | OpenAIRE |
Externí odkaz: |