A Knowledge Acquisition Method for Improving Data Quality in Services Engagements
Autor: | Tanveer A. Faruquie, Mohan N. Dani, K. Hima Prasad, Mukesh K. Mohania, L. Venkata Subramaniam, Rishabh Garg, Govind Kothari, Varsha N. Swamy |
---|---|
Rok vydání: | 2010 |
Předmět: |
Service (systems architecture)
Data cleansing Standardization business.industry Computer science InformationSystems_DATABASEMANAGEMENT Context (language use) Ripple-down rules computer.software_genre Data science Knowledge acquisition Knowledge-based systems Data quality Software engineering business computer |
Zdroj: | IEEE SCC |
DOI: | 10.1109/scc.2010.91 |
Popis: | Poor Data Quality is a serious problem affecting enterprises. Enterprise databases are large and manual data cleansing is not feasible. For such large databases it is logical to attempt to cleanse the data in an automated way. This has led to the development of commercial tools for automatic cleansing. However, offering data cleansing as a service has been a challenge because of the need to customize the tool for different datasets. This is because current commercial systems lack the ability to incorporate the unique exceptions of different data sources. This makes the migration of underlying data cleansing algorithms from one dataset to another difficult. In this paper we specifically look at the address standardization task. We use Ripple Down Rules (RDR) framework to lower the manual effort required in rewriting the rules from one source to another. The RDR framework allows us to incrementally patch the existing rules or add exceptions without breaking other rules. We compare the RDR approach with a conditional random field (CRF) address standardization system and an existing commercially available data cleansing tool. We demonstrate that RDR is an effective knowledge acquisition method and that its adoption for data cleansing can allow data cleansing to be offered as a service. |
Databáze: | OpenAIRE |
Externí odkaz: |