Popis: |
Companies and government agencies frequently own data sets containing personal information about clients, survey responders, or users of a product. Sometimes these organizations are required or wish to release anonymized versions of this information to the public. Prior to releasing these data, they use established privacy preservation methods such as binning, data perturbation, and data suppression to maintain the anonymity of clients, customers, or survey participants. However, existing work has shown that common privacy preserving measures fail when anonymized data are combined with data from online social networks, social media sites, and data aggregation sites. This paper introduces a methodology for determining the vulnerability of individuals in a pre-released data set to reidentification using public data. As part of this methodology, we propose novel metrics to quantify the amount of information that can be gained from combining pre-released data with publicly available online data. We then investigate how to utilize our metrics to identify individuals in the data set who may be particularly vulnerable to this form of data combination. We demonstrate the effectiveness of our methodology on a real world data set using public data from both social networking and data aggregation sites. |