Automated heuristic based context dependent ETL process to generate multi‐dimensional model for tabular data.

Autor: Hira, Swati, Deshpande, Parag S.
Předmět:
Zdroj: Concurrency & Computation: Practice & Experience; Jan2023, Vol. 35 Issue 2, p1-31, 31p
Abstrakt: Summary: Over the past decade, enterprises have broadly adopted data warehousing in various activities. Today, abundant information is available on websites in the form of tables or spreadsheets. This huge amount of data cannot be processed directly because of its complexity, heterogeneity, and gap between user requirements. In this work, an automatic approach is proposed to build the multi‐dimensional structure (MDS) of heterogeneous tabular data format for intelligent decision‐making. The proposed MDS is generated by identifying components such as dimensions and hierarchies. It automatically extracts measures based on the spatial characteristics of data dimensions like region, time as well as their hierarchies. This proposed approach automatically generates a multi‐dimensional model for BI tools without complicated ETL (Extraction, Transformation and Loading) process and helps to solve several business queries, like "Top 5 states in India based on Irrigated area in 2009". Moreover, the proposed method reduces the time and cost of building multi‐dimensional models to a very large extent. The correctness of proposed method is tested with the synthetic and economic datasets of Government websites where information is stored in tabular formats and various heterogeneous setups where the proposed method saved approximately 4000 to 5000 computing hours of the ETL process. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index