Algebricks

Autor: Nicola Onose, Vinayak Borkar, Michael J. Carey, Pouria Pirzadeh, Till Westmann, Yingyi Bu, E. Preston Carman, Vassilis J. Tsotras
Rok vydání: 2015
Předmět:
Zdroj: SoCC
DOI: 10.1145/2806777.2806941
Popis: A number of high-level query languages, such as Hive, Pig, Flume, and Jaql, have been developed in recent years to increase analyst productivity when processing and analyzing very large datasets. The implementation of each of these languages includes a complete, data model-dependent query compiler, yet each involves a number of similar optimizations. In this work, we describe a new query compiler architecture that separates language-specific and data model-dependent aspects from a more general query compiler backend that can generate executable data-parallel programs for shared-nothing clusters and can be used to develop multiple languages with different data models. We have built such a data model-agnostic query compiler substrate, called Algebricks, and have used it to implement three different query languages --- HiveQL, AQL, and XQuery --- to validate the efficacy of this approach. Experiments show that all three query languages benefit from the parallelization and optimization that Algebricks provides and thus have good parallel speedup and scaleup characteristics for large datasets.
Databáze: OpenAIRE