Implicit Dedupe Learning Method on Contextual Data Quality Problems
Autor: | Daouda Ahmat Mahamat, Alladoumbaye Ngueilbaye, Hongzhi Wang, Roland Madadjim |
---|---|
Rok vydání: | 2021 |
Předmět: |
Computer science
business.industry media_common.quotation_subject Big data Machine learning computer.software_genre Variety (cybernetics) Information extraction Contextual design Data quality Data deduplication Web application Quality (business) Artificial intelligence business computer media_common |
Zdroj: | Advances in Data Science and Information Engineering ISBN: 9783030717032 |
Popis: | Variety of applications such as information extraction, data mining, e-learning, or web applications use heterogeneous and distributed data. As a result, the usage of data is challenged by deduplication issues. To harmonize this issue, the present study proposed a novel dedupe learning method (DLM) and other algorithms to detect and correct contextual data quality anomalies. The method was created and implemented on structured data. Our methods have been successful in identifying and correcting more data anomalies than current taxonomy techniques. Consequently, these proposed methods would be important in detecting and correcting errors in broad contextual data (big data). |
Databáze: | OpenAIRE |
Externí odkaz: |