Addressing the Big Data Multi-class Imbalance Problem with Oversampling and Deep Learning Neural Networks
Autor: | Eréndira Rendón, R. Alejo, E. E. Granda-Gutiérrez, V. M. González-Barcenas, R. M. Valdovinos |
---|---|
Rok vydání: | 2019 |
Předmět: |
Artificial neural network
Computer science business.industry Deep learning Big data 02 engineering and technology Machine learning computer.software_genre Under sampling Perceptron Class imbalance 020204 information systems 0202 electrical engineering electronic engineering information engineering Oversampling 020201 artificial intelligence & image processing Artificial intelligence business Classifier (UML) computer |
Zdroj: | Pattern Recognition and Image Analysis ISBN: 9783030313319 IbPRIA (1) |
Popis: | The class imbalance problem is a challenging situation in machine learning but also it appears frequently in recent Big Data applications. The most studied techniques to deal with the class imbalance problem have been Random Over Sampling (ROS), Random Under Sampling (RUS) and Synthetic Minority Over-sampling Technique (SMOTE), especially in two-class scenarios. However, in the Big Data scale, multi-class imbalance scenarios have not extensively studied yet, and only a few investigations have been performed. In this work, the effectiveness of ROS and SMOTE techniques is analyzed in the Big data multi-class imbalance context. The KDD99 dataset, which is a popular multi-class imbalanced big data set, was used to probe these oversampling techniques, prior to the application of a Deep Learning Multi-Layer Perceptron. Results show that ROS and SMOTE are not always enough to improve the classifier performance in the minority classes. However, they slightly increase the overall performance of the classifier in comparison to the unsampled data. |
Databáze: | OpenAIRE |
Externí odkaz: |