Reducing patient re-identification risk for laboratory results within research datasets.

Autor: Atreya RV; Department of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN 37232-8340, USA. ravi.v.atreya@vanderbilt.edu, Smith JC, McCoy AB, Malin B, Miller RA
Jazyk: angličtina
Zdroj: Journal of the American Medical Informatics Association : JAMIA [J Am Med Inform Assoc] 2013 Jan 01; Vol. 20 (1), pp. 95-101. Date of Electronic Publication: 2012 Jul 21.
DOI: 10.1136/amiajnl-2012-001026
Abstrakt: Objective: To try to lower patient re-identification risks for biomedical research databases containing laboratory test results while also minimizing changes in clinical data interpretation.
Materials and Methods: In our threat model, an attacker obtains 5-7 laboratory results from one patient and uses them as a search key to discover the corresponding record in a de-identified biomedical research database. To test our models, the existing Vanderbilt TIME database of 8.5 million Safe Harbor de-identified laboratory results from 61 280 patients was used. The uniqueness of unaltered laboratory results in the dataset was examined, and then two data perturbation models were applied-simple random offsets and an expert-derived clinical meaning-preserving model. A rank-based re-identification algorithm to mimic an attack was used. The re-identification risk and the retention of clinical meaning for each model's perturbed laboratory results were assessed.
Results: Differences in re-identification rates between the algorithms were small despite substantial divergence in altered clinical meaning. The expert algorithm maintained the clinical meaning of laboratory results better (affecting up to 4% of test results) than simple perturbation (affecting up to 26%).
Discussion and Conclusion: With growing impetus for sharing clinical data for research, and in view of healthcare-related federal privacy regulation, methods to mitigate risks of re-identification are important. A practical, expert-derived perturbation algorithm that demonstrated potential utility was developed. Similar approaches might enable administrators to select data protection scheme parameters that meet their preferences in the trade-off between the protection of privacy and the retention of clinical meaning of shared data.
Databáze: MEDLINE