Perturbed robust linear estimating equations for confidentiality protection in remote analysis
Autor: | Atikur R. Khan, Sebastien Lucie, Soonmin Kwon, Christine M. O'Keefe, Tim Ayre, Soomin Song |
---|---|
Rok vydání: | 2016 |
Předmět: |
Statistics and Probability
Data custodian Computer science 010401 analytical chemistry Microdata (statistics) Policy analysis computer.software_genre 01 natural sciences Data science 0104 chemical sciences Theoretical Computer Science Robust regression 010104 statistics & probability Computational Theory and Mathematics Respondent Outlier Confidentiality Data mining 0101 mathematics Statistics Probability and Uncertainty computer Custodians |
Zdroj: | Statistics and Computing. 27:775-787 |
ISSN: | 1573-1375 0960-3174 |
DOI: | 10.1007/s11222-016-9653-2 |
Popis: | National statistical agencies and other data custodians collect and hold a vast amount of survey and census data, containing information vital for research and policy analysis. However, the problem of allowing analysis of these data, while protecting respondent confidentiality, has proved challenging to address. In this paper we will focus on the remote analysis approach, under which a confidential dataset is held in a secure environment under the direct control of the data custodian agency. A computer system within the secure environment accepts a query from an analyst, runs it on the data, then returns the results to the analyst. In particular, the analyst does not have direct access to the data at all, and cannot view any microdata records. We further focus on the fitting of linear regression models to confidential data in the presence of outliers and influential points, such as are often present in business data. We propose a new method for protecting confidentiality in linear regression via a remote analysis system, that provides additional confidentiality protection for outliers and influential points in the data. The method we describe in this paper was designed for the prototype DataAnalyser system developed by the Australian Bureau of Statistics, however the method would be suitable for similar remote analysis systems. |
Databáze: | OpenAIRE |
Externí odkaz: |