Support Vector Machines for Class Imbalance Rail Data Classification with Bootstrapping-based Over-Sampling and Under-Sampling

Autor: Yong Yang, Mahdi Mahfouf, Steven F. Thornton, Ali Zughrat
Rok vydání: 2014
Předmět:
Zdroj: IFAC Proceedings Volumes. 47:8756-8761
ISSN: 1474-6670
DOI: 10.3182/20140824-6-za-1003.00794
Popis: Support Vector Machines (SVMs) is a popular machine learning technique, which has proven to be very effective in solving many classical problems with balanced data sets in various application areas. However, this technique is also said to perform poorly when it is applied to the problem of learning from heavily imbalanced data sets where the majority classes significantly outnumber the minority classes. In this paper, we tackle the problem of learning from severely imbalanced Rail dataset via a new iterative support vector machine algorithm with bootstrapping-based over-sampling and under-sampling. We combine the good generalization ability of SVMs with the class distribution advantages of resampling techniques. Under-sampling and Over-sampling are commonly used methods for overcoming the class imbalance problem. In this work, we also address the influence of under-sampling and over-sampling techniques on rail data and show that achieving an optimal sampling rate yields a better SVM generalization capability. Experimental results show that the under-sampling outperforms over-sampling. The iterative SVM technique also shows a competitive generalization performance on the under-sampled rail data set, and that under-sampling can decrease the computational complexity of SVM algorithm.
Databáze: OpenAIRE