Addressing the Big Data Multi-class Imbalance Problem with Oversampling and Deep Learning Neural Networks

Autor: Eréndira Rendón, R. Alejo, E. E. Granda-Gutiérrez, V. M. González-Barcenas, R. M. Valdovinos
Rok vydání: 2019
Předmět:
Zdroj: Pattern Recognition and Image Analysis ISBN: 9783030313319
IbPRIA (1)
Popis: The class imbalance problem is a challenging situation in machine learning but also it appears frequently in recent Big Data applications. The most studied techniques to deal with the class imbalance problem have been Random Over Sampling (ROS), Random Under Sampling (RUS) and Synthetic Minority Over-sampling Technique (SMOTE), especially in two-class scenarios. However, in the Big Data scale, multi-class imbalance scenarios have not extensively studied yet, and only a few investigations have been performed. In this work, the effectiveness of ROS and SMOTE techniques is analyzed in the Big data multi-class imbalance context. The KDD99 dataset, which is a popular multi-class imbalanced big data set, was used to probe these oversampling techniques, prior to the application of a Deep Learning Multi-Layer Perceptron. Results show that ROS and SMOTE are not always enough to improve the classifier performance in the minority classes. However, they slightly increase the overall performance of the classifier in comparison to the unsampled data.
Databáze: OpenAIRE