QETL: An approach to on-demand ETL from non-owned data sources
Autor: | Simone Graziani, Lorenzo Baldacci, Matteo Golfarelli, Stefano Rizzi |
---|---|
Přispěvatelé: | Lorenzo, Baldacci, Matteo, Golfarelli, Simone, Graziani, Stefano, Rizzi |
Rok vydání: | 2017 |
Předmět: |
Information Systems and Management
Source data Database Process (engineering) Computer science Online analytical processing Data reuse InformationSystems_DATABASEMANAGEMENT On-demand ETL Incremental loading OLAP Data provider Cube (algebra) 02 engineering and technology computer.software_genre Data warehouse 020204 information systems On demand 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Data mining computer |
Zdroj: | Data & Knowledge Engineering. 112:17-37 |
ISSN: | 0169-023X |
Popis: | In traditional OLAP systems, the ETL process loads all available data in the data warehouse before users start querying them. In some cases, this may be either inconvenient (because data are supplied from a provider for a fee) or unfeasible (because of their size); on the other hand, directly launching each analysis query on source data would not enable data reuse, leading to poor performance and high costs. The alternative investigated in this paper is that of fetching and storing data on-demand, i.e., as they are needed during the analysis process. In this direction we propose the Query-Extract-Transform-Load (QETL) paradigm to feed a multidimensional cube; the idea is to fetch facts from the source data provider, load them into the cube only when they are needed to answer some OLAP query, and drop them when some free space is needed to load other facts. Remarkably, QETL includes an optimization step to cheaply extract the required data based on the specific features of the data provider. The experimental tests, made on a real case study in the genomics area, show that QETL effectively reuses data to cut extraction costs, thus leading to significant performance improvements. |
Databáze: | OpenAIRE |
Externí odkaz: |