Performance Analysis of Ensemble Based Approaches to Mitigate Class Imbalance Problem after Applying Normalization

Autor:	Mahit Kumar Paul, Atik Shahriar Pranto
Rok vydání:	2021
Předmět:	Set (abstract data type) Data set Normalization (statistics) ComputingMethodologies_PATTERNRECOGNITION Boosting (machine learning) Computer science Undersampling Oversampling Data mining computer.software_genre computer Data modeling Random forest
Zdroj:	2021 International Conference on Automation, Control and Mechatronics for Industry 4.0 (ACMI).
DOI:	10.1109/acmi53878.2021.9528132
Popis:	In class imbalanced data set, one class contains more instances than the other class and it is a critical problem in data mining. Many approaches such as oversampling, undersampling, and cost sensitive methods are developed to mitigate the effects of class imbalance but these methods suffer from various shortcomings. In the existing methods, the researchers have hardly used normalization on the imbalanced data set to mitigate the effects. In this work, we implemented two state-of-the-art data balancing methods, Random Undersampling (RUS) and Random Oversampling (ROS), ensembled by AdaBoost algorithm. Then we investigated and compared the two methods with a recently developed approach called Random Splitting data balancing (SplitBal) method with and without applying normalization on the imbalanced data set. For normalization, three well known normalization techniques are used called min-max, z-score and robust-scaling normalization. Our concerned approach, SplitBal is an ensemble method which firstly converts the imbalanced data set into several balanced data set. From the balanced data set, multiple classification models are built and ensembled by max ensemble rule. The empirical analysis using fifteen imbalanced data set elucidates that SplitBal with min-max normalization is dominant over the concerned data balancing methods in this work for Random Forest classifier.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::809ef57b1172785629275d23648398bd https://doi.org/10.1109/acmi53878.2021.9528132 Zobrazit plný text záznamu