Abstrakt: |
This paper presents a predictive model for the detection of new customers of the natural gas service distributed by network that have not completed the connection process but are consuming the service without being billed. The CRISP-DM methodology was used to achieve this purpose. First, the process of managing gas service connection requests for new customers was understood; then, customer data and data associated with each stage of the process were identified, as well as an indicator of atypical customers who were disconnected from the service once it was detected that they were consuming without having completed the process. In this way, a dataset was obtained with a total of 6020 requests processed between 2020 and 2022. Algorithms such as Regularized Logistic Regression, Partial Least Squares (PLS) regression and Extreme Gradient Boosting (XGBoost) were used. Once the models were evaluated, XGBoost obtained the best metrics with an Accuracy of 82% and an AUC-ROC curve was 93.6%. For the interpretation of the results, the SHAP values were used, which made it possible to establish that the most relevant variables for detection are: the time spent by the request in the connection process, the geographical location of the customer and the stages that the request fulfills within the connection process. Finally, it is possible to conclude that the results improved the ability to detect those clients that should be checked or disconnected from the service. In future work, the accuracy of the model could be improved through field validation of the predictions and the addition of external variables to the process. [ABSTRACT FROM AUTHOR] |