On-Demand Snapshot Maintenance in Data Warehouses Using Incremental ETL Pipeline

Autor: Weiping Qu, Stefan Dessloch
Rok vydání: 2017
Předmět:
Zdroj: Transactions on Large-Scale Data-and Knowledge-Centered Systems XXXII ISBN: 9783662556078
DOI: 10.1007/978-3-662-55608-5_5
Popis: Multi-version concurrency control method has nowadays been widely used in data warehouses to provide OLAP queries and ETL maintenance flows with concurrent access. A snapshot is taken on existing warehouse tables to answer a certain query independently of concurrent updates. In this work, we extend the snapshot in the data warehouse with the deltas which reside at the source side of ETL flows. Before answering a query which accesses the warehouse tables, relevant tables are first refreshed with the exact source deltas which are captured until this query arrives and haven’t been synchronized with the tables yet (called on-demand maintenance). Snapshot maintenance is done by an incremental recomputation pipeline which is flushed by a set of consecutive, non-overlapping delta batches in delta streams which are split according to a sequence of incoming queries. A workload scheduler is thereby used to achieve a serializable schedule of concurrent maintenance jobs and OLAP queries. Performance has been examined by using read-/update-heavy workloads.
Databáze: OpenAIRE