Popis: |
Governmental agencies that conduct surveys and censuses collect data from respondents with the purpose of releasing it in the form of statistical summaries. The more detailed the summary is, the more likely a data intruder will be able to extract confidential data about individual respondents from the released data. However, there are various ways of redesigning the data product and/or modifying the data themselves to protect the data while preserving their usefulness. We discuss methods that achieve these two goals: (i) a data intruder will not be able to extract, with high confidence, confidential data directly from the data product or derive confidential microdata from several data products; and (ii) the released data are still quite detailed and useful to most data users, including researchers. Such “data-masking” methods comprise a fast growing field often called statistical disclosure control. We discuss some simpler methods that have been used for decades, such as detail reduction, cell suppression, and data swapping; some methods developed in the 1990s, such as rank swapping, data shuffling, and multiplicative noise; and some methods developed in recent decade, such as randomization of microdata with constraints (PRAM) and synthetic data. Keywords: disclosure limitation; statistical disclosure control; data swapping; rank swapping; multiplicative noise; synthetic data; cell suppression |