Evaluating Redundancy and Partitioning of Geospatial Data in Document-Oriented Data Warehouses

Autor: Robson do Nascimento Fidalgo, Rinaldo Lima, Marcio Ferro
Rok vydání: 2019
Předmět:
Zdroj: Big Data Analytics and Knowledge Discovery ISBN: 9783030275198
DaWaK
Popis: A Geospatial Data Warehouse (GDW) is a repository of historical and geospatial data used in the decision-making process. These systems manage large volumes of data, and their dimensions are usually denormalized to increase query performance. Many studies have analyzed the impact of geospatial data redundancy on a relational GDW. However, to the best of our knowledge, no previous study performed a similar analysis considering the NoSQL scenario. In this context, to design a scalable document-oriented GDW (DGDW) with low storage cost and low query response time, it is important to identify which geospatial fields should be normalized (referenced) or denormalized (embedded), as well as how the documents should be partitioned among collections. In this study, we exhaustively evaluated 36 DGDWs in the MongoDB document-oriented database with different levels of geospatial redundancy and different approaches to partitioning documents among collections. Our experimental results indicate that both the normalization of low-selectivity geospatial fields and the partitioning of documents into homogenous collections provide better query performance and lower storage space. The performance evaluation presented in this paper provides strong evidence that can help guide the creation of a DGDW.
Databáze: OpenAIRE