Differential Privacy for Text Analytics via Natural Text Sanitization

Autor:	Yue, Xiang, Du, Minxin, Wang, Tianhao, Li, Yaliang, Sun, Huan, Chow, Sherman S. M.
Rok vydání:	2021
Předmět:	Computer Science - Computation and Language Computer Science - Cryptography and Security
Druh dokumentu:	Working Paper
Popis:	Texts convey sophisticated knowledge. However, texts also convey sensitive information. Despite the success of general-purpose language models and domain-specific mechanisms with differential privacy (DP), existing text sanitization mechanisms still provide low utility, as cursed by the high-dimensional text representation. The companion issue of utilizing sanitized texts for downstream analytics is also under-explored. This paper takes a direct approach to text sanitization. Our insight is to consider both sensitivity and similarity via our new local DP notion. The sanitized texts also contribute to our sanitization-aware pretraining and fine-tuning, enabling privacy-preserving natural language processing over the BERT language model with promising utility. Surprisingly, the high utility does not boost up the success rate of inference attacks. Comment: ACL-ICJNLP'21 Findings; The first two authors contributed equally
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2106.01221 Zobrazit plný text záznamu View this record from Arxiv