CTGAN-ENN: a tabular GAN-based hybrid sampling method for imbalanced and overlapped data in customer churn prediction

Autor: I Nyoman Mahayasa Adiputra, Paweena Wanchai
Jazyk: angličtina
Rok vydání: 2024
Předmět:
Zdroj: Journal of Big Data, Vol 11, Iss 1, Pp 1-25 (2024)
Druh dokumentu: article
ISSN: 2196-1115
DOI: 10.1186/s40537-024-00982-x
Popis: Abstract Class imbalance is one of many problems of customer churn datasets. One of the common problems is class overlap, where the data have a similar instance between classes. The prediction task of customer churn becomes more challenging when there is class overlap in the data training. In this research, we suggested a hybrid method based on tabular GANs, called CTGAN-ENN, to address class overlap and imbalanced data in datasets of customers that churn. We used five different customer churn datasets from an open platform. CTGAN is a tabular GAN-based oversampling to address class imbalance but has a class overlap problem. We combined CTGAN with the ENN under-sampling technique to overcome the class overlap. CTGAN-ENN reduced the number of class overlaps by each feature in all datasets. We investigated how effective CTGAN-ENN is in each machine learning technique. Based on our experiments, CTGAN-ENN achieved satisfactory results in KNN, GBM, XGB and LGB machine learning performance for customer churn predictions. We compared CTGAN-ENN with common over-sampling and hybrid sampling methods, and CTGAN-ENN achieved outperform results compared with other sampling methods and algorithm-level methods with cost-sensitive learning in several machine learning algorithms. We provide a time consumption algorithm between CTGAN and CTGAN-ENN. CTGAN-ENN achieved less time consumption than CTGAN. Our research work provides a new framework to handle customer churn prediction problems with several types of imbalanced datasets and can be useful in real-world data from customer churn prediction.
Databáze: Directory of Open Access Journals