Application of clustering and association methods in data cleaning

Autor:	L. Ciszak
Rok vydání:	2008
Předmět:	Data cleansing Reference data Validation rule Computer science Data stream mining Data quality Data mining Data pre-processing computer.software_genre Cluster analysis computer Data warehouse
Zdroj:	IMCSIT
ISSN:	1896-7094
Popis:	Data cleaning is a process of maintaining data quality in information systems. Current data cleaning solutions require reference data to identify incorrect or duplicate entries. This article proposes usage of data mining in the area of data cleaning as effective in discovering reference data and validation rules from the data itself. Two algorithms designed by the author for data attribute correction have been presented. Both algorithms utilize data mining methods. Experimental results show that both algorithms can effectively clean text attributes without external reference data.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::cf9c3b4e0be21890e9bd39038f1ebd85 https://doi.org/10.1109/imcsit.2008.4747224 Zobrazit plný text záznamu