Real-Time Snapshot Maintenance with Incremental ETL Pipelines in Data Warehouses

Autor: Vinanthi Basavaraj, Weiping Qu, Sahana Shankar, Stefan Dessloch
Rok vydání: 2015
Předmět:
Zdroj: Big Data Analytics and Knowledge Discovery ISBN: 9783319227283
DaWaK
DOI: 10.1007/978-3-319-22729-0_17
Popis: Multi-version concurrency control method has nowadays been widely used in data warehouses to provide OLAP queries and ETL maintenance flows with concurrent access. A snapshot is taken on existing warehouse tables to answer a certain query independently of concurrent updates. In this work, we extend this snapshot with the deltas which reside at the source side of ETL flows. Before answering a query, relevant tables are first refreshed with the exact source deltas which are captured at the time this query arrives (so-called query-driven policy). Snapshot maintenance is done by an incremental recomputation pipeline which is flushed by a set of consecutive deltas belonging to a sequence of incoming queries. A workload scheduler is thereby used to achieve a serializable schedule of concurrent maintenance tasks and OLAP queries. Performance has been examined by using read-/update-heavy workloads.
Databáze: OpenAIRE