Method for the Assessment of Semantic Accuracy Using Rules Identified by Conditional Functional Dependencies
Autor: | Fabio Silva Lopes, Vanusa S. Santana |
---|---|
Rok vydání: | 2019 |
Předmět: |
Computer science
Process (engineering) media_common.quotation_subject Context (language use) 02 engineering and technology computer.software_genre Set (abstract data type) Resource (project management) Knowledge extraction 020204 information systems Data quality 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Quality (business) Data mining Dimension (data warehouse) computer media_common |
Zdroj: | Metadata and Semantic Research ISBN: 9783030365981 MTSR |
DOI: | 10.1007/978-3-030-36599-8_25 |
Popis: | Data is a central resource of organizations, which makes data quality essential for their intellectual growth. Quality is seen as a multifaceted concept and, in general, refers to suitability for use. This indicates that the pillar for the quality evaluation is the definition of a set of quality rules, determined from the criteria of the business. However, it may be impossible to manually specify the quality rules for the evaluation. The use of Conditional Functional Dependencies (CFDs) allows to automatically identifying context-dependent quality rules. This paper presents a method for assess data quality using the CFD concept to extract quality rules and identify inconsistencies. The quality of the database in the proposed method will be evaluated in the semantic accuracy dimension. The method consolidates the process of knowledge discovery with data quality assessment, listing the respective activities that result in the quantification of semantic accuracy. An instance of the method has been demonstrated by applying it in the context of air quality monitoring data. The evaluation of the method showed that the CFDs rules were able to reflect some atmospheric phenomena, emerging interesting context-dependent rules. The patterns of the transactions, which may be unknown by the users, can be used as input for the evaluation and monitoring of data quality. |
Databáze: | OpenAIRE |
Externí odkaz: |