Discovering Most Important Data Quality Dimensions Using Latent Semantic Analysis

Autor: Carlisle George, Suraj Juddoo
Rok vydání: 2018
Předmět:
Zdroj: 2018 International Conference on Advances in Big Data, Computing and Data Communication Systems (icABCD).
DOI: 10.1109/icabcd.2018.8465129
Popis: Big Data quality is a field which is emerging. Many authors nowadays agree that data quality is still very relevant, even for Big Data uses. However, there is a lack of frameworks or guidelines about how to carry out those big data quality initiatives. The starting point of any data quality work is to determine the properties of data quality, termed as data quality dimensions (DQDs). Even those dimensions lack precise rigour in terms of definition from existing literature. This current research aims to contribute towards identifying the most important DQDs for big data in the health industry. It is a continuation of a previous work, which already identified five most important DQDs, using a human judgement based technique known as inner hermeneutic cycle. To remove potential bias coming from the human judgement aspect, this research uses the same set of literature but applies a statistical technique known to extract knowledge from a set of documents known as latent semantic analysis. The results confirm only 2 similar most important DQDs, namely accuracy and completeness.
Databáze: OpenAIRE