Clydesdale

Autor:	Tim Kaldewey, Sandeep Tata, Andrey Balmin
Rok vydání:	2012
Předmět:	Data model Database law Computer science Scala Star schema Schema (psychology) InformationSystems_DATABASEMANAGEMENT A* search algorithm computer.software_genre computer law.invention computer.programming_language
Zdroj:	SIGMOD Conference
DOI:	10.1145/2213836.2213938
Popis:	There have been several recent proposals modifying Hadoop, radically changing the storage organization or query processing techniques to obtain good performance for structured data processing. We will showcase Clydesdale, a research prototype for structured data processing on Hadoop that can achieve dramatic performance improvements over existing solutions, without any changes to the underlying MapReduce implementation. Clydesdale achieves this through a novel synthesis of several techniques from the database literature and carefully adapting them to the Hadoop environment. On the star schema benchmark, we show that Clydesdale is on average 38x faster than Hive, the dominant approach for structured data processing on Hadoop today. To the best of our knowledge, Clydesdale is the fastest solution for processing workloads on structured data sets that fit a star schema on Hadoop. Attendees will be able to run queries on the data from the star schema benchmark on a remote Hadoop cluster with Clydesdale and Hive installed, and get a breakdown of the time taken to execute the query. Attendees will also be able to pose their own queries using ClyQL -- a novel embedded DSL in Scala that can be used to rapidly prototype star join queries. With this demonstration, we hope to convince the attendees that unlike previously thought, Hadoop can indeed efficiently support structured data processing.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::85b0596dec01b3fcb6fd7b7a514ce959 https://doi.org/10.1145/2213836.2213938 Zobrazit plný text záznamu