Statistical Tradeoffs between Generalization and Suppression in the De-identification of Large-Scale Data Sets
Autor: | Olivia Angiuli, James H. Waldo |
---|---|
Rok vydání: | 2016 |
Předmět: |
0301 basic medicine
Information privacy Computer science Privacy software Generalization De-identification 02 engineering and technology computer.software_genre Data set Set (abstract data type) 03 medical and health sciences 030104 developmental biology 020204 information systems 0202 electrical engineering electronic engineering information engineering Data mining computer Private information retrieval Anonymity |
Zdroj: | 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC). |
Popis: | Data sets containing private information about individuals must satisfy privacy standards before being publicly released. One such standard, k-anonymity, reduces the probability of the re-identification of individuals by requiring that rare combinations of personally-identifiable information be represented by at least k distinct individuals. Records that violate this standard must be altered, which can lead to significant distortion of the statistical properties of the data set. In this paper, we discuss improvements to two techniques used to achieve k-anonymity, generalization and suppression, that confer k-anonymity while better preserving the statistical properties of an educational data set taken from a massive online open course platform, edX. |
Databáze: | OpenAIRE |
Externí odkaz: |