A Schema Integration Approach for Big Data Analysis.

Autor: Amghar, Souad, Cherdal, Safae, Mouline, Salma
Předmět:
Zdroj: Ingénierie des Systèmes d'Information; Apr2023, Vol. 28 Issue 2, p315-325, 11p
Abstrakt: A huge volume of data is analyzed by organizations to understand their clients and improve their services. In many cases, these data are stored separately in different database systems and need to be integrated before being used in analysis tools or prediction applications. One of the main tasks of data integration process is the definition of the global schema. Defining a global schema in the context of NoSQL systems is a demanding task since it necessitates dealing with a variety of issues, including the lack of local schemas, data model heterogeneity, and semantic heterogeneity. To address these challenges, this work aims to automatically define the global schema of a set of databases stored in heterogeneous NoSQL systems. The main contributions of this work are presented in three phases: (1) Schema extraction where we define the local schemas using a unified representation. (2) Schema matching in which we propose a hybrid approach to find matching attributes between the local schemas. (3) Schema integration where we define the global schema using the schema matching results. A Covid-19 use case as well as other benchmarks are presented in this paper to evaluate the results of the proposed approach and illustrate its effectiveness. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index