Implicit Dedupe Learning Method on Contextual Data Quality Problems

Autor: Daouda Ahmat Mahamat, Alladoumbaye Ngueilbaye, Hongzhi Wang, Roland Madadjim
Rok vydání: 2021
Předmět:
Zdroj: Advances in Data Science and Information Engineering ISBN: 9783030717032
Popis: Variety of applications such as information extraction, data mining, e-learning, or web applications use heterogeneous and distributed data. As a result, the usage of data is challenged by deduplication issues. To harmonize this issue, the present study proposed a novel dedupe learning method (DLM) and other algorithms to detect and correct contextual data quality anomalies. The method was created and implemented on structured data. Our methods have been successful in identifying and correcting more data anomalies than current taxonomy techniques. Consequently, these proposed methods would be important in detecting and correcting errors in broad contextual data (big data).
Databáze: OpenAIRE