A new Monte Carlo sampling method based on Gaussian Mixture Model for imbalanced data classification

Autor: Gang Chen, Binjie Hou, Tiangang Lei
Jazyk: angličtina
Rok vydání: 2023
Předmět:
Zdroj: Mathematical Biosciences and Engineering, Vol 20, Iss 10, Pp 17866-17885 (2023)
Druh dokumentu: article
ISSN: 1551-0018
DOI: 10.3934/mbe.2023794?viewType=HTML
Popis: Imbalanced data classification has been a major topic in the machine learning community. Different approaches can be taken to solve the issue in recent years, and researchers have given a lot of attention to data level techniques and algorithm level. However, existing methods often generate samples in specific regions without considering the complexity of imbalanced distributions. This can lead to learning models overemphasizing certain difficult factors in the minority data. In this paper, a Monte Carlo sampling algorithm based on Gaussian Mixture Model (MCS-GMM) is proposed. In MCS-GMM, we utilize the Gaussian mixed model to fit the distribution of the imbalanced data and apply the Monte Carlo algorithm to generate new data. Then, in order to reduce the impact of data overlap, the three sigma rule is used to divide data into four types, and the weight of each minority class instance based on its neighbor and probability density function. Based on experiments conducted on Knowledge Extraction based on Evolutionary Learning datasets, our method has been proven to be effective and outperforms existing approaches such as Synthetic Minority Over-sampling TEchnique.
Databáze: Directory of Open Access Journals