Hybrid Oversampling and Undersampling Method (HOUM) via Safe-Level SMOTE and Support Vector Machine.

Autor: Yilmaz Eroglu, Duygu, Pir, Mestan Sahin
Předmět:
Zdroj: Applied Sciences (2076-3417); Nov2024, Vol. 14 Issue 22, p10438, 19p
Abstrakt: The improvements in collecting and processing data using machine learning algorithms have increased the interest in data mining. This trend has led to the development of real-life decision support systems (DSSs) in diverse areas such as biomedical informatics, fraud detection, natural language processing, face recognition, autonomous vehicles, image processing, and each part of the real production environment. The imbalanced datasets in some of these studies, which result in low performance measures, have highlighted the need for additional efforts to address this issue. The proposed method (HOUM) is used to address the issue of imbalanced datasets for classification problems in this study. The aim of the model is to prevent the overfitting problem caused by oversampling and valuable data loss caused by undersampling in imbalanced data and obtain successful classification results. The HOUM is a hybrid approach that tackles imbalanced class distribution challenges, refines datasets, and improves model robustness. In the first step, majority-class data points that are distant from the decision boundary obtained via SVM are reduced. If the data are not balanced, SLS is employed to augment the minority-class data. This loop continues until the dataset becomes balanced. The main contribution of the proposed method is reproducing informative minority data using SLS and diminishing non-informative majority data using the SVM before applying classification techniques. Firstly, the efficiency of the proposed method, the HOUM, is verified by comparison with the SMOTE, SMOTEENN, and SMOTETomek techniques using eight datasets. Then, the results of the W-SIMO and RusAda algorithms, which were developed for imbalanced datasets, are compared with those of the HOUM. The strength of the HOUM is revealed through this comparison. The proposed HOUM algorithm utilizes a real dataset obtained from a project endorsed by The Scientific and Technical Research Council of Turkey. The collected data include quality control and processing parameters of yarn data. The aim of this project is to prevent yarn breakage errors during the weaving process on looms. This study introduces a decision support system (DSS) designed to prevent yarn breakage during fabric weaving. The high performance of the algorithm may encourage producers to manage yarn flow and enhance the HOUM's efficiency as a DSS. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index