A Clustering Based Anonymization Model for Big Data

Autor:	Aydincan Kalyoncu, Mucahid Ercimen, Yavuz Canbay, Adem Dogan, Seref Sagiroglu
Jazyk:	angličtina
Rok vydání:	2019
Předmět:	Information sensitivity Publishing business.industry Computer science Order (business) Spark (mathematics) Big data k-anonymity business Cluster analysis Data science Publication
Popis:	Today, a lot of institutions collect and store big data belongs to their respondents (client, patients, users, firms etc.). The main purposes of these actions can be such as doing their missions and providing better services (modeling, extracting behavior patterns, disease detection, making future plans, creating policies, developing decision-making mechanisms). To benefit from the collected big data at a higher level, it is inevitable to publish the data. However, if the big data includes sensitive information about responders, a direct release of these data may cause disclosure of identities of respondents. Hence new solutions to protect the privacy of respondents are always required. Anonymization is a utility-based privacy preserving approach that is frequently used in privacy-preserving big data publishing (PPBDP). In this paper, a clustering-based anonymization model on Spark is proposed and applied for the first time. The main purpose of the proposed approach is evaluating anonymization problem as a clustering problem. Distributed k-Means algorithm is used for anonymization in the proposed model. In order to adopt a clustering-based approach to k-anonymity, some assumptions were made. As a result, the proposed model provides a plausible solution to PPBDP.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::ef193194aa39dfa9faad2a16808033ba https://avesis.gazi.edu.tr/publication/details/de64081a-83c5-45e2-a0a6-7ff9d238a654/oai Zobrazit plný text záznamu