Privacy-preserving data mining for open government data from heterogeneous sources
Autor: | Arslan, Şuayb Şefik, Birgili, Bengi, Tosun, Petek |
---|---|
Přispěvatelé: | Arslan, Şuayb Şefik |
Rok vydání: | 2021 |
Předmět: |
Open government
Government Sociology and Political Science Computer science 05 social sciences Charter Linkage (mechanical) Library and Information Sciences computer.software_genre 0506 political science law.invention Data sharing Identifier Open data Open government data Heterogeneous data sources law 050602 political science & public administration Data mining 0509 other social sciences 050904 information & library sciences Law computer Record linkage |
Zdroj: | Government Information Quarterly. 38:101544 |
ISSN: | 0740-624X |
DOI: | 10.1016/j.giq.2020.101544 |
Popis: | Open data is a global movement with the potential to generate significant social and economic benefits. Policies on open government data (OGD) inspire the development of new and innovative services that government agencies may lack. The International Open Data Charter adequately describes the importance of data mining. Governments that have signed this charter should focus on the following areas—(i) data mining, (ii) linkage, and (iii) in-depth analysis, i.e., distribution of open data that is freely accessible for elaborate analysis using machine reading. However, a series of practical difficulties is observed in connection with the data mining of OGD for in-depth analysis. First, most OGD do not have identifiers to prevent privacy disclosure. Second, owing to the nature of siloed data, the data sharing and collection methods vary with respect to heterogeneous OGD, and administrative or institutional barriers need to be overcome. This has created a demand for a novel technical solution that applies micro-aggregation and distance-based record linkage to address the aforementioned issues. Thus, in this study, a method capable of integrating two or more de-identified OGDs into one dataset to enable OGD data mining is proposed. In addition, the proposed method allows users to adjust the privacy threshold level to determine an appropriate balance between privacy disclosure risk and data utility. The effectiveness of the method is evaluated in terms of several metrics via extensive experimentation. This study emphasizes the importance of the research on efficient utilization of already-published OGDs, which has been relatively neglected in the past. Further, it broadens the research area for privacy-preserving data mining by proposing a method capable of mining heterogeneous data even in the absence of identifiers. http://openaccess.mef.edu.tr/xmlui/discover |
Databáze: | OpenAIRE |
Externí odkaz: |