A High-Performance Distributed Relational Database System for Scalable OLAP Processing
Autor: | Jason Arnold, Ioan Raicu, Boris Glavic |
---|---|
Rok vydání: | 2019 |
Předmět: |
SQL
Distributed database Computer science Relational database Distributed computing Online analytical processing InformationSystems_DATABASEMANAGEMENT 020207 software engineering 02 engineering and technology computer.software_genre Data set Relational database management system Spark (mathematics) Scalability 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Massively parallel computer computer.programming_language |
Zdroj: | IPDPS |
Popis: | The scalability of systems such as Hive and Spark SQL that are built on top of big data platforms have enabled query processing over very large data sets. However, the per-node performance of these systems is typically low compared to traditional relational databases. Conversely, Massively Parallel Processing (MPP) databases do not scale as well as these systems. We present HRDBMS, a fully implemented distributed shared-nothing relational database developed with the goal of improving the scalability of OLAP queries. HRDBMS achieves high scalability through a principled combination of techniques from relational and big data systems with novel communication and work-distribution techniques. While we also support serializable transactions, the system has not been optimized for this use case. HRDBMS runs on a custom distributed and asynchronous execution engine that was built from the ground up to support highly parallelized operator implementations. Our experimental comparison with Hive, Spark SQL, and Greenplum confirms that HRDBMS's scalability is on par with Hive and Spark SQL (up to 96 nodes) while its per-node performance can compete with MPP databases like Greenplum. |
Databáze: | OpenAIRE |
Externí odkaz: |