Popis: |
For the problem in multi-class imbalance, traditional oversampling algorithms easily lead to the issue of overgeneralization and overlap result with poor classification performance. To improve the performance of multi-class learning, a sampling safety coefficient for multi-class imbalance oversampling (SSCMIO) algorithm is proposed. First, with the aim of preventing overgeneralization, the neighbor sampling safety coefficient is designed to assign a small weight to those neighborhoods that may cause excessive generalization. Then, by considering the global characteristics of the sample points, the reverse neighbor sampling safety coefficient is presented to prevent new samples that invade into other classes, which alleviates the overlap between classes. Finally, the C4.5 decision tree is used as the base classifier. Compared with 7 representative oversampling algorithms within 16 public real data sets, SSCMIO can obtain more than 11 optimal values on precision, recall, F-measure, MG and MAUC, the maximum increase with the 5 metrics is 0.4818, 0.3053, 0.3420, 0.2664, and 0.1307 respectively. The experimental results show that the SSCMIO algorithm can achieve better classification performance than other 7 algorithms. |