Benchmarking Evaluation Protocols for Classifiers Trained on Differentially Private Synthetic Data

Autor:	Parisa Movahedi, Valtteri Nieminen, Ileana Montoya Perez, Hiba Daafane, Dishant Sukhwal, Tapio Pahikkala, Antti Airola
Jazyk:	angličtina
Rok vydání:	2024
Předmět:	Biomedical data classification differential privacy generative AI model evaluation synthetic data Electrical engineering. Electronics. Nuclear engineering TK1-9971
Zdroj:	IEEE Access, Vol 12, Pp 118637-118648 (2024)
Druh dokumentu:	article
ISSN:	2169-3536
DOI:	10.1109/ACCESS.2024.3446913
Popis:	Differentially private (DP) synthetic data has emerged as a potential solution for sharing sensitive individual-level biomedical data. DP generative models offer a promising approach for generating realistic synthetic data that aims to maintain the original data’s central statistical properties while ensuring privacy by limiting the risk of disclosing sensitive information about individuals. However, the issue regarding how to assess the expected real-world prediction performance of machine learning models trained on synthetic data remains an open question. In this study, we experimentally evaluate two different model evaluation protocols for classifiers trained on synthetic data. The first protocol employs solely synthetic data for downstream model evaluation, whereas the second protocol assumes limited DP access to a private test set consisting of real data managed by a data curator. We also propose a metric for assessing how well the evaluation results of the proposed protocols match the real-world prediction performance of the models. The assessment measures both the systematic error component indicating how optimistic or pessimistic the protocol is on average and the random error component indicating the variability of the protocol’s error. The results of our study suggest that employing the second protocol is advantageous, particularly in biomedical health studies where the precision of the research is of utmost importance. Our comprehensive empirical study offers new insights into the practical feasibility and usefulness of different evaluation protocols for classifiers trained on DP-synthetic data.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/d9c3a0691312462997bff20bffd13791 Zobrazit plný text záznamu View record in DOAJ