Comparative analysis of resampling techniques on Machine Learning algorithm

Autor: Amelia, Tri Suci, Hasibuan, Mila Nirmala Sari, Pane, Rahmadani
Rok vydání: 2022
Předmět:
Zdroj: Sinkron : jurnal dan penelitian teknik informatika; Vol. 7 No. 2 (2022): Articles Research Volume 7 Issue 2, April 2022; 628-634
ISSN: 2541-2019
2541-044X
DOI: 10.33395/sinkron.v7i2.11427
Popis: Generally, classification algorithms in the field of data science assume that the classes of training data are equally distributed. However, datasets on real problems often have an unbalanced class distribution. Unbalanced dataset classes make up the majority class and the minority class. In general, minority classes are more attractive and more important to identify. In this case, the correct classification for the minority class sample is more valuable than the majority class. The unbalanced class distribution causes the classification algorithm to have difficulty in classifying minority class samples correctly. If the performance of the algorithm model is good for the majority class sample but bad for the minority class then this imbalance problem is a crucial thing to be addressed. Many solutions are offered for this problem, namely by oversampling techniques in the minority class and/or undersampling techniques in the majority class. In this study, the authors tried various sampling techniques and tested them on various machine learning classification algorithms to find out the combination of resampling techniques and algorithms that have high recall in classifying minority class samples and still considering the majority class classification.
Databáze: OpenAIRE