Privacy-preserving data mining for open government data from heterogeneous sources

Autor: Arslan, Şuayb Şefik, Birgili, Bengi, Tosun, Petek
Přispěvatelé: Arslan, Şuayb Şefik
Rok vydání: 2021
Předmět:
Zdroj: Government Information Quarterly. 38:101544
ISSN: 0740-624X
DOI: 10.1016/j.giq.2020.101544
Popis: Open data is a global movement with the potential to generate significant social and economic benefits. Policies on open government data (OGD) inspire the development of new and innovative services that government agencies may lack. The International Open Data Charter adequately describes the importance of data mining. Governments that have signed this charter should focus on the following areas—(i) data mining, (ii) linkage, and (iii) in-depth analysis, i.e., distribution of open data that is freely accessible for elaborate analysis using machine reading. However, a series of practical difficulties is observed in connection with the data mining of OGD for in-depth analysis. First, most OGD do not have identifiers to prevent privacy disclosure. Second, owing to the nature of siloed data, the data sharing and collection methods vary with respect to heterogeneous OGD, and administrative or institutional barriers need to be overcome. This has created a demand for a novel technical solution that applies micro-aggregation and distance-based record linkage to address the aforementioned issues. Thus, in this study, a method capable of integrating two or more de-identified OGDs into one dataset to enable OGD data mining is proposed. In addition, the proposed method allows users to adjust the privacy threshold level to determine an appropriate balance between privacy disclosure risk and data utility. The effectiveness of the method is evaluated in terms of several metrics via extensive experimentation. This study emphasizes the importance of the research on efficient utilization of already-published OGDs, which has been relatively neglected in the past. Further, it broadens the research area for privacy-preserving data mining by proposing a method capable of mining heterogeneous data even in the absence of identifiers. http://openaccess.mef.edu.tr/xmlui/discover
Databáze: OpenAIRE