Learning New Words from Keystroke Data with Local Differential Privacy

Autor: Sungwook Kim, Soo-Hyung Kim, Hyejin Shin, Baek Chunghun, Jun-bum Shin
Rok vydání: 2020
Předmět:
Zdroj: IEEE Transactions on Knowledge and Data Engineering. 32:479-491
ISSN: 2326-3865
1041-4347
DOI: 10.1109/tkde.2018.2885749
Popis: Keystroke data collected from smart devices includes various sensitive information about users. Collecting and analyzing such data raise serious privacy concerns. Google and Apple have recently applied local differential privacy (LDP) to address privacy issue on learning new words from users’ keystroke data. However, these solutions require multiple LDP reports for a single word, which result in inefficient use of privacy budget and high computational cost. In this paper, we develop a novel algorithm for learning new words under LDP. Unlike the existing solutions, the proposed method generates only one LDP report for a single word. This enables the proposed method to use full privacy budget for generating a report and brings the benefit that the proposed method provides better utility at the same privacy degree than the existing methods. In our algorithm, each user appends a hash value to new word and sends only one LDP report of an $n$ n -gram selected randomly from the string packed by each new word and its hash value. The server then decodes frequent $n$ n -grams at each position of the string and discovers the candidate words by exploring graph-theoretic links between $n$ n -grams and checking integrity of candidates with hash values. Frequencies of frequent new words discovered are estimated from distribution estimates of $n$ n -grams by robust regression. We theoretically show that our algorithm can recover popular new words even though the server does not know the domain of the raw data. In addition, we theoretically and empirically demonstrate that our algorithm achieves higher accuracy compared to the existing solutions.
Databáze: OpenAIRE