Differential Privacy for Text Analytics via Natural Text Sanitization

Autor: Yue, Xiang, Du, Minxin, Wang, Tianhao, Li, Yaliang, Sun, Huan, Chow, Sherman S. M.
Rok vydání: 2021
Předmět:
Druh dokumentu: Working Paper
Popis: Texts convey sophisticated knowledge. However, texts also convey sensitive information. Despite the success of general-purpose language models and domain-specific mechanisms with differential privacy (DP), existing text sanitization mechanisms still provide low utility, as cursed by the high-dimensional text representation. The companion issue of utilizing sanitized texts for downstream analytics is also under-explored. This paper takes a direct approach to text sanitization. Our insight is to consider both sensitivity and similarity via our new local DP notion. The sanitized texts also contribute to our sanitization-aware pretraining and fine-tuning, enabling privacy-preserving natural language processing over the BERT language model with promising utility. Surprisingly, the high utility does not boost up the success rate of inference attacks.
Comment: ACL-ICJNLP'21 Findings; The first two authors contributed equally
Databáze: arXiv