Statistical Tradeoffs between Generalization and Suppression in the De-identification of Large-Scale Data Sets

Autor: Olivia Angiuli, James H. Waldo
Rok vydání: 2016
Předmět:
Zdroj: 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC).
Popis: Data sets containing private information about individuals must satisfy privacy standards before being publicly released. One such standard, k-anonymity, reduces the probability of the re-identification of individuals by requiring that rare combinations of personally-identifiable information be represented by at least k distinct individuals. Records that violate this standard must be altered, which can lead to significant distortion of the statistical properties of the data set. In this paper, we discuss improvements to two techniques used to achieve k-anonymity, generalization and suppression, that confer k-anonymity while better preserving the statistical properties of an educational data set taken from a massive online open course platform, edX.
Databáze: OpenAIRE