DQ2S- A framework for data quality-aware information management

Autor: Pedro Sampaio, Chao Dong, Sandra Sampaio
Jazyk: angličtina
Rok vydání: 2015
Předmět:
Zdroj: Sampaio, S F M, Dong, C & Sampaio, P 2015, ' DQ2S-A framework for data quality-aware information management ', Expert Systems with Applications, vol. 42, no. 21, pp. 8304-8326 . https://doi.org/10.1016/j.eswa.2015.06.050
DOI: 10.1016/j.eswa.2015.06.050
Popis: Design of a data quality-aware information management framework and system.Users measure data quality based on an extensible set of data profiling algorithms.Query language, system architecture and heuristic optimization approach developed.System design based on seamless extensions to SQL and relational database systems.Applied in e-Business scenarios and potential for big data profiling discussed. This paper describes the design and implementation of the Data Quality Query System (DQ2S), a query processing framework and tool incorporating data quality profiling functionality in the processing of queries involving quality-aware query language extensions. DQ2S supports the combination of performance and quality-oriented query optimizations, and a query processing platform that enables advanced data profiling queries to be formulated based on well established query language constructs, often used to interact with relational database management systems. DQ2S encompasses a declarative query language and a data model that provides users with the capability to express constraints on the quality of query results as well as query quality-related information; a set of algebraic operators for manipulating data quality-related information, and optimization heuristics. The proposed query language and algebra represent seamless extensions to SQL and relational database engines, respectively. The constructs of the proposed data model are implemented at the user's view level and are internally mapped into relational model constructs. The quality-aware extensions and features are extremely useful when users need to assess the quality of relational data sets and define quality constraints for acceptable data prior to using candidate data sources in decision support systems and conducting big data analytical tasks.
Databáze: OpenAIRE