An Approach for Schema Extraction of NoSQL Columnar Databases: the HBase Case Study

Autor: Eduardo Dias Defreyn, Angelo Augusto Frozza, Ronaldo dos Santos Mello
Rok vydání: 2021
Předmět:
Zdroj: Journal of Information and Data Management. 12
ISSN: 2178-7107
DOI: 10.5753/jidm.2021.1966
Popis: Although NoSQL databases do not require a schema a priori, being aware of the database schema is essential for activities like data integration, data validation, or data interoperability. This paper presents a process for the extraction of columnar NoSQL database schemas. We adopt JSON as a canonical format for data representation, and we validate the proposed process through a prototype tool that is able to extract schemas from the HBase columnar NoSQL database system. HBase was chosen as a case study because it is one of the most popular columnar NoSQL solutions. When compared to related work, we innovate by proposing a simple solution for the inference of column data types for columnar NoSQL databases that store only byte arrays as column values, and a resulting schema that follows the JSON Schema format.
Databáze: OpenAIRE