Research on User Consumption Behavior Prediction Based on Improved XGBoost Algorithm
Autor: | Yan Xiangbin, Wang Xingfen, Ma Yangchun |
---|---|
Rok vydání: | 2018 |
Předmět: |
020203 distributed computing
Training set Computer science Feature vector Decision tree 020206 networking & telecommunications Sample (statistics) 02 engineering and technology Logistic regression Set (abstract data type) Test set 0202 electrical engineering electronic engineering information engineering Cluster analysis Algorithm |
Zdroj: | IEEE BigData |
DOI: | 10.1109/bigdata.2018.8622235 |
Popis: | This paper is to propose an improved algorithm in modeling user consumption behavior, which combined Logistic regression and XGBoost algorithm to predict users’ purchasing behavior in an e-commerce website.XGBoost, as a feature transformation, is used to make sample prediction. According to the prediction results, the information of each regression tree will construct the new feature vector, which will be the input data of the logistic regression model. The previous improved clustering algorithm [1] will be involved to cluster the different user divisions for further comparative analysis with the three predictive models in this paper.Specifically, more than 50 million original data are collected and preprocessed for correlation mining. 60% are selected randomly to be the training set and 20% to be the verification set and the rest 20% as the test set. Logistic regression and XGBoost algorithm are used respectively to set up two models based on making use of the advantages of each. The research shows that Logistic regression on XGBoost method is feasible and the evaluation index of the model is better than any methods being used alone. |
Databáze: | OpenAIRE |
Externí odkaz: |