Research on User Consumption Behavior Prediction Based on Improved XGBoost Algorithm

Autor: Yan Xiangbin, Wang Xingfen, Ma Yangchun
Rok vydání: 2018
Předmět:
Zdroj: IEEE BigData
DOI: 10.1109/bigdata.2018.8622235
Popis: This paper is to propose an improved algorithm in modeling user consumption behavior, which combined Logistic regression and XGBoost algorithm to predict users’ purchasing behavior in an e-commerce website.XGBoost, as a feature transformation, is used to make sample prediction. According to the prediction results, the information of each regression tree will construct the new feature vector, which will be the input data of the logistic regression model. The previous improved clustering algorithm [1] will be involved to cluster the different user divisions for further comparative analysis with the three predictive models in this paper.Specifically, more than 50 million original data are collected and preprocessed for correlation mining. 60% are selected randomly to be the training set and 20% to be the verification set and the rest 20% as the test set. Logistic regression and XGBoost algorithm are used respectively to set up two models based on making use of the advantages of each. The research shows that Logistic regression on XGBoost method is feasible and the evaluation index of the model is better than any methods being used alone.
Databáze: OpenAIRE