Instant-on scientific data warehouses: Lazy ETL for data-intensive research
Autor: | Y. Kargin, Holger Pirk, Stefan Manegold, Milena Ivanova, Martin L. Kersten |
---|---|
Přispěvatelé: | Database Architectures, Centrum Wiskunde & Informatica, Amsterdam (CWI), The Netherlands |
Jazyk: | angličtina |
Rok vydání: | 2012 |
Předmět: | |
Zdroj: | Lecture Notes in Business Information Processing ISBN: 9783642398711 BIRTE |
Popis: | In the dawn of the data intensive research era, scientific discovery deploys data analysis techniques similar to those that drive business intelligence. Similar to classical Extract, Transform and Load (ETL) processes, data is loaded entirely from external data sources (repositories) into a scientific data warehouse before it can be analyzed. This process is both, time and resource intensive and may not be entirely necessary if only a subset of the data is of interest to a particular user. To overcome this problem, we propose a novel technique to lower the costs for data loading: Lazy ETL. Data is extracted and loaded transparently on-the-fly only for the required data items. Extensive experiments demonstrate the significant reduction of the time from source data availability to query answer compared to state-of-the-art solutions. In addition to reducing the costs for bootstrapping a scientific data warehouse, our approach also reduces the costs for loading new incoming data. |
Databáze: | OpenAIRE |
Externí odkaz: |