Abstrakt: |
The rapid development of telecommunications services is increasingly attracting millions of users due to the convenience of interaction, promotion and communication. The abundance of daily transaction information has led to the creation of large data sources that are collected over time. This data source is a valuable resource for analyzing and understanding user habits and needs, devising a strategy to maintain and attract potential customers. Therefore, it is necessary to have a suitable system capable of collecting, storing and analyzing large datasets with efficient performance. In this article, we introduce Florus, a big data framework based on Lakehouse architecture, which can tackle these challenges. By applying this framework, we are able to propose an approach to analyzing customer behaviors in the telecommunication industry with a large dataset. Our work focuses on specific analysis of a huge volume of data presented in tables of different schemas, reflecting the business operation over time. Clustering based on the Bisecting K-Means algorithm will support the exploration of customer segments varying in density and complexity, and then characterize them into homogeneous groups to gain a better understanding of the market demand. Furthermore, the enterprise can forecast the revenue income at different levels, which can be applied to every customer. The work was tested with the Gradient Boosted Tree at the end of a data enriching and transformation pipeline. Overall, this work highlights the potential of Florus in supporting customer analysis experiments. Implementing the framework would significantly enhance our ability to conduct comprehensive analyses across the entire customer lifecycle. |