Dynamic speculative optimizations for SQL compilation in Apache Spark

Autor:	Walter Binder, Filippo Schiavio, Daniele Bonetta
Rok vydání:	2020
Předmět:	SQL Computer science General Engineering 020207 software engineering 02 engineering and technology computer.software_genre JSON Data access 020204 information systems Spark (mathematics) 0202 electrical engineering electronic engineering information engineering Operating system Benchmark (computing) Code generation Compiler computer Machine code computer.programming_language
Zdroj:	Proceedings of the VLDB Endowment. 13:754-767
ISSN:	2150-8097
DOI:	10.14778/3377369.3377382
Popis:	Big-data systems have gained significant momentum, and Apache Spark is becoming a de-facto standard for modern data analytics. Spark relies on SQL query compilation to optimize the execution performance of analytical workloads on a variety of data sources. Despite its scalable architecture, Spark's SQL code generation suffers from significant runtime overheads related to data access and de-serialization. Such performance penalty can be significant, especially when applications operate on human-readable data formats such as CSV or JSON. In this paper we present a new approach to query compilation that overcomes these limitations by relying on run-time profiling and dynamic code generation. Our new SQL compiler for Spark produces highly-efficient machine code, leading to speedups of up to 4.4x on the TPC-H benchmark with textual-form data formats such as CSV or JSON.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::b8abf53d040f205ce3925f760ea533bc https://doi.org/10.14778/3377369.3377382 Zobrazit plný text záznamu