The Conceptual Integration Modelling Framework: Semantics and Query Answering

Autor: Ekaterina, Guseva
Jazyk: angličtina
Rok vydání: 2016
Druh dokumentu: Diplomová práce
DOI: 10.20381/ruor-3966
Popis: In the context of business intelligence (BI), the accuracy and accessibility of information consolidation play an important role. Integrating data from different sources involves its transformation according to constraints expressed in an appropriate language. The Conceptual Integration Modelling framework (CIM) acts as such a language. The CIM is aimed to allow business users to specify what information is needed in a simplified and comprehensive language. Achieving this requires raising the level of abstraction to the conceptual level, so that users are able to pose queries expressed in a conceptual query language (CQL). The CIM is comprised of three facets: an Extended Entity Relationship (EER) model (a high level conceptual model that is used to design databases), a conceptual schema against which users pose their queries, a relational multidimensional model that represents data sources, and mappings between the conceptual schema and sources. Such mappings can be specified in two ways: in the first scenario, the so-called global-as-view (GAV), the global schema is mapped to views over the relational sources by specifying how to obtain tuples of the global relation from tuples in the sources. In the second scenario, sources may contain less detailed information (a more aggregated data) so the local relations are defined as views over global relations that is called as local-as-view (LAV). In this thesis, we address the problem of expressibility and decidability of queries written in CQL. We first define the semantics of the CIM by translating the conceptual model so we could translate it into a set of first order sentences containing a class of conceptual dependencies (CDs) - tuple-generating dependencies (TGDs) and equality generating dependencies (EGDs), in addition to certain (first order) restrictions to express multidimensionality. Here a multidimensionality means that facts in a data warehouse can be described from different perspectives. The EGDs set the equality between tuples and the TGDs set the rule that two instances are in a subtype association (more precise definitions are given further in the thesis). We use a non-conflicting class of conceptual dependencies that guarantees a query's decidability. The non-conflicting dependencies avoid an interaction between TGDs and EGDs. Our semantics extend the existing semantics defined for extended entity relationship models to the notions of fact, dimension category, dimensional hierarchy and dimension attributes. In addition, a class of conceptual queries will be defined and proven to be decidable. A DL-Lite logic has been extensively used for query rewriting as it allows us to reduce the complexity of the query answering to AC0. Moreover, we present a query rewriting algorithm for the class of defined conceptual dependencies. Finally, we consider the problem in light of GAV and LAV approaches and prove the query answering complexities. The query answering problem becomes decidable if we add certain constraints to a well-known set of EGDs + TGDs dependencies to guarantee summarizability. The query answering problem in light of the global-as-a-view approach of mapping has AC0 data complexity and EXPTIME combined complexity. This problem becomes coNP hard if we are to consider it a LAV approach of mapping.
Databáze: Networked Digital Library of Theses & Dissertations