Comparative analysis of resampling techniques on Machine Learning algorithm
Autor: | Amelia, Tri Suci, Hasibuan, Mila Nirmala Sari, Pane, Rahmadani |
---|---|
Rok vydání: | 2022 |
Předmět: | |
Zdroj: | Sinkron : jurnal dan penelitian teknik informatika; Vol. 7 No. 2 (2022): Articles Research Volume 7 Issue 2, April 2022; 628-634 |
ISSN: | 2541-2019 2541-044X |
DOI: | 10.33395/sinkron.v7i2.11427 |
Popis: | Generally, classification algorithms in the field of data science assume that the classes of training data are equally distributed. However, datasets on real problems often have an unbalanced class distribution. Unbalanced dataset classes make up the majority class and the minority class. In general, minority classes are more attractive and more important to identify. In this case, the correct classification for the minority class sample is more valuable than the majority class. The unbalanced class distribution causes the classification algorithm to have difficulty in classifying minority class samples correctly. If the performance of the algorithm model is good for the majority class sample but bad for the minority class then this imbalance problem is a crucial thing to be addressed. Many solutions are offered for this problem, namely by oversampling techniques in the minority class and/or undersampling techniques in the majority class. In this study, the authors tried various sampling techniques and tested them on various machine learning classification algorithms to find out the combination of resampling techniques and algorithms that have high recall in classifying minority class samples and still considering the majority class classification. |
Databáze: | OpenAIRE |
Externí odkaz: |