Popis: |
Data management applications are rapidly growing applications that require more attention, especially in the big data era. Thus, it is critical to support these applications with novel and efficient algorithms that satisfy higher performance. Array database management systems are one way to support these applications by dealing with data represented in n-dimensional data structures. For instance, software like SciDB and RasDaMan can be powerful tools to achieve the required performance on large-scale problems with multidimensional data. Like their relational counterparts, these management systems support specific array query languages as the user interface.Further, as a popular programming model, MapReduce allows large-scale data analysis and has also been leveraged to facilitate query processing and used as a database engine. Nevertheless, one major obstacle is the low productivity of developing MapReduce applications. Unlike the high-level declarative language such as SQL, MapReduce jobs are written in low-level descriptive language, often requiring massive programming efforts and complicated programming debugging processes.This paper presents a system that supports translating array queries expressed by AQL (Array Query Language) in SciDB into MapReduce jobs. We focus on effectively translating some unique structural aggregations, including circular, grid, hierarchical, and sliding aggregations. Unlike the traditional aggregations in relational databases, these structural aggregations are designed explicitly for array manipulation. Thus, our work can be considered an array-view counterpart of some existing SQL-to- MapReduce translators like HiveQL/Hive and YSmart. We show that our translator can effectively support structural aggregations over arrays (or sub-arrays) to meet various array manipulations. Moreover, our translator can help user-defined aggregation functions with the minimum effort of the user.We also show that our translator can generate optimized MapReduce code, leading to significantly better performance than the short hand-written code by up to 10.84X. |