Differentially Private Non Parametric Copulas: Generating synthetic data with non parametric copulas under privacy guarantees

Autor: Osorio-Marulanda, Pablo A., Ramirez, John Esteban Castro, Jiménez, Mikel Hernández, Reyes, Nicolas Moreno, Unanue, Gorka Epelde
Rok vydání: 2024
Předmět:
Druh dokumentu: Working Paper
Popis: Creation of synthetic data models has represented a significant advancement across diverse scientific fields, but this technology also brings important privacy considerations for users. This work focuses on enhancing a non-parametric copula-based synthetic data generation model, DPNPC, by incorporating Differential Privacy through an Enhanced Fourier Perturbation method. The model generates synthetic data for mixed tabular databases while preserving privacy. We compare DPNPC with three other models (PrivBayes, DP-Copula, and DP-Histogram) across three public datasets, evaluating privacy, utility, and execution time. DPNPC outperforms others in modeling multivariate dependencies, maintaining privacy for small $\epsilon$ values, and reducing training times. However, limitations include the need to assess the model's performance with different encoding methods and consider additional privacy attacks. Future research should address these areas to enhance privacy-preserving synthetic data generation.
Comment: 12 pages, 5 figures, deciding 2025 conference to which to submit
Databáze: arXiv