Abstrakt: |
Industrial control system anomaly detection faces the problem of data imbalance, in which the class overlap phenomenon which exists in imbalanced data exacerbates the difficulty of classifier detection. Coping strategies based on data class balancing or data overlap detecting are more often adopted, but these approaches suffer from poor model stability or low overlap recognition rate. In response, a hybrid sampling method for the overlap region, OverlapRHS, is proposed, which uses support vector data description to construct the overlap detection model on majority and minority class samples respectively, and applies hybrid sampling to samples in the overlap data region by combining synthetic minority class with neighborhood cleaning. Finally the method is combined with four classical classifiers, tested on four publicly available imbalanced datasets, and compared with four other sampling methods for handling imbalance problems. The experimental results show that the proposed method can effectively detect the overlap data in the imbalanced dataset, and improve the training effectiveness of classifiers through efficient and targeted data hybrid sampling, which improves the anomaly detection performance of classifiers on the imbalanced data and shows significant advantages than other sampling methods for imbalanced data handling. [ABSTRACT FROM AUTHOR] |