Popis: |
Data analytics is pervasive in retailing as a key tool to gain customer insights. Often, the data sets used are large, but also rich, i.e., they contain specific information, including demographic details, about individual customers. Typical usage of the analytics include personalized recommendations, churn prediction and estimating customer life-time value. In this application paper, an investigation is carried out using a very large real-world data set from the fashion retailing industry, containing only limited information. Specifically, while the purchases can be connected to individual customers, there is no additional information available about the customers. With this in mind, the main purpose is to discover what the company can learn about their business and their customers as a group, based on the available data. The exploratory analysis uses data from four years, where each year has more than 1 million customers and 6 million transactions. Using traditional RFM (Recency, Frequency and Monetary) analysis, including looking at the transitions between different segments between two years, some interesting patterns can be observed. As an example, more than half of the customers are replaced each year. In a second experiment, the possibility to predict which of the customers that are the most likely to not make a purchase the next year is examined. Interestingly enough, while the two algorithms evaluated obtained very similar f-measures; the random forest had a substantially higher precision, while the gradient boosting showed clearly better recall. In the last experiment, targeting only the customers that have remained loyal for at least three years, rule sets describing patterns and trends that are strong indicators for churn or not are inspected and analyzed. |