Should they stay or should they go? Predict whether consumers are going to leave their energy supplier

Autor:	Vezzoli, M, Zogmaister, C
Přispěvatelé:	Vezzoli, M, Zogmaister, C
Jazyk:	angličtina
Rok vydání:	2018
Předmět:	Big Data Churn prediction Predictive modelling Data mining
Popis:	Background. The Italian energy market is going through a liberalisation process. After many years of monopoly by the Enel supplier, it will be entirely liberalised from 2019. This full liberalisation will be the result of a process that has begun in 1999. In the initial phases of the liberalisation, the new companies entering the Italian market were mainly focused on attracting new customers. Now, the attention has shifted toward retaining existing ones. This change requires the building of statistical models to predict which clients are intentioned to churn (i.e., to leave the company), the understanding of the reasons behind this intention, and the development of strategies aimed at customer retention. In this era of Big Data, these goals are made possible by the fact that companies are flooded with a massive amount of very different type of data (mainly, social demographic information and past interactions between client and company) that could allow a better understanding of the complex psychological dynamics of churn behaviour. Objectives. Since the literature on churn prediction in the energy market is still absent, the primary aim of this study is to develop an initial churn prediction model in the electricity market. To attain the goal we have used data mining and machine learning methodologies and techniques. Secondly, we aim to detect what information about consumers are more predictive in this particular context and therefore, to shed some light on the reasons that lie behind churn behaviour. Research question(s) and/or hypothesis/es. This study aims to explore and understand energy consumer behaviour through a data-driven approach. One of the essential characteristics of big data is the variety of information we can hold about consumers. The modelling of this amount of information could allow us to recognise the hidden value of consumer's characteristics that have never been considered before, possibly because of the theory-driven approach that typically characterises psychological research. On the other hand, modelling may confirm the value of some other features that were detected by theory-driven models of consumer behaviour. Method/Approach. To build a predictive model we have used a data mining approach, which is an analytic process to find unknown relationships between the information about consumers and their future behaviour. In the first phases of the process, we have dealt with the issue of data quality: data needed to be cleaned and prepared for being modelled. After this stage, we ended up with a dataset composed of 81836 consumers, each of which owns one electricity domestic contract. The set of predictors consists of demographic (e.g. age), account (e.g. length of the contract), behavioural (e.g. the number of contacts the consumer had with the company) and socio-economic information (e.g. whether the customer lives in a wealthy or struggling area). In the second phase, the data were modelled with two machine learning algorithms: decision tree (CART and C5.0) and logistic regression. Each model was first built on a training set (n = 57269; 70% of the dataset); then the best-trained model was tested on a set of unseen data (n = 24544; 30% of the dataset). To assess the model performance, we used the Area Under the ROC Curve metrics. Finally, as the distribution of instances in the criterion variable is highly uneven (8% churners and 92 % non-churners), we had to handle the class imbalance issue. Class imbalance may cause churn models to break down because of the lack of information. To solve the uneven distribution within the criterion variable, we used the SMOTE resampling, which oversamples the minority class through the K-Nearest Neighbour graph. The SMOTE training dataset is composed of 53328 consumers. On this resampled dataset, we trained and tested the same algorithms we used for unbalanced data. Results. We built six different models, each of which is different because of the algorithm used and whether or not the minority class was resampled. Logistic regression on unbalanced dataset reached the best performance (AUC on the training = 0.67 and AUC on the test = 0.68). In addition to providing the best performance, logistic regression achieved more stable results between training and testing than other models did. For each feature, we calculated the odds ratio for assessing the direction and the strength of the predictive relationship between each predictor and the criterion. The features that increase the likelihood of churn are the regional area, the type acquisition channel, the type of the contract and whether the consumer has already churned on previous contracts. Furthermore, the features that decrease the likelihood of future churn are the length of the contract, the subscription to a loyalty program and having received cross-sell offers. Discussion. Although the level of performance of the model was modest, we have achieved our primary objective. This model enabled us to reach a first understanding of consumer churn behaviour. We have found that subscription to a loyalty program is one of the most important predictors of our model. Being loyal, that is to be committed to the company over time, regardless of changes in competitors pricing or changes in the external environment, has crucial psychological importance. We do not know whether loyalty affects the likelihood of churn directly or if it has an impact because loyal consumer shares some characteristics which decrease the likelihood of churn. Therefore, we should figure out if loyalty is a protective factor against churn, what is its psychological role and whether the various components of loyalty may have a different impact on churn behaviour. The results we have reached open to a wide range of research opportunities. More specifically, we could try to expand the model towards consumer segments (e.g. domestic gas consumer or B2B), use different algorithms for building models, or understand more deeply the causal relationship between a specific predictor (e.g. loyalty) and churn behaviour through various experiments.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=od______1299::7339289d66ce7fb4400cb55619ba0a84 https://hdl.handle.net/10281/199477 Zobrazit plný text záznamu