AQUA+: Query Optimization for Hybrid Database-MapReduce System
Autor: | Sai Wu, Zhifei Pang, Haichao Huang, Yuqing Xie, Zhouzhenyan Hong |
---|---|
Rok vydání: | 2019 |
Předmět: |
Artificial neural network
Exploit Database Distributed database business.industry Computer science Interface (computing) 02 engineering and technology Query optimization computer.software_genre Human-Computer Interaction Query plan Artificial Intelligence Hardware and Architecture 020204 information systems Hybrid system Computer data storage 0202 electrical engineering electronic engineering information engineering business Distributed File System computer Software Information Systems Database engine |
Zdroj: | ICBK |
DOI: | 10.1109/icbk.2019.00034 |
Popis: | MapReduce has been widely recognized as an efficient tool for large-scale data analysis. It achieves high performance by exploiting parallelism among processing nodes while providing a simple interface for upper-layer applications. However, there are many existing applications maintaining their data in a distributed database. It is costly to export those data into the storage system of MapReduce (normally a distributed file system). Moreover, compared to MapReduce, database is equipped with many state-of-the-art techniques, such as index and optimizer. Therefore, a hybrid Database-MapReduce system inheriting the advantages of both systems is preferred. In this paper, we propose AQUA+, a query optimizer tailored for the hybrid system. AQUA+ is an extension work of our previous system AQUA. It generates a plan that adaptively assigns the operators to the database engine and MapReduce engine to optimize the performance. The intuition is to exploit the index, co-partition and other features provided by the database as much as possible and reduce the data volume processed by the MapReduce. Due to the complexity of query optimization, in AQUA+, we introduce a novel tuning technique, learning to optimize. In particular, two neural networks are trained to predict cost and refine query plan, respectively. We train them based on our log of real query processing. Experiments carried out on our in-house cluster confirm the effectiveness of our query optimizer. |
Databáze: | OpenAIRE |
Externí odkaz: |