Privacy vs. correlation in information retrieval and aggregation

Autor: Naim, Carolina
Rok vydání: 2023
DOI: 10.7282/t3-md2t-8m96
Popis: Privacy is now a major challenge encountered by users who can unknowingly reveal critical personal information through their online activities. Due to the correlation over time between the different behaviors of an online user or the correlation between his attributes, care should be taken when proposing privacy solutions. The main goal of this dissertation is to explore this interplay between privacy and correlation. To that end, we consider two problems that examine this tension, (i) ON-OFF privacy with correlated requests and (ii) private multi-group aggregation. We start by considering the problem of ON-OFF privacy in which a user is interested in the latest message generated by one of n sources available at a server. The user has the choice to turn privacy ON or OFF depending on whether he wants to hide his interest at the time or not. The challenge is that the statistical correlation over time of a user’s online behavior can lead to information leakage. As a consequence of correlation, the user cannot simply ignore privacy when privacy is OFF. We model the correlation between a user’s requests by an n-state Markov chain. Our goal is to design ON-OFF privacy schemes with optimal download rates that ensure privacy for past and past and future requests. We present inner and outer bounds on the achievable download rate for n sources. We also devise an efficient algorithm to construct an ON-OFF privacy scheme achieving the inner bound and prove its optimality for special families of Markov chains, such as in the case of n = 2 sources. In general, for n > 2, finding tighter outer bounds and efficient constructions of ON-OFF privacy schemes that would achieve them remains an open problem. We then study the differentially private multi-group aggregation (PMGA) problem. This setting involves a single server and n users. Each user belongs to one of k distinct groups and holds a discrete value. The goal is to design schemes that allow the server to find the aggregate (sum) of the values in each group (with high accuracy) under communication and local differential privacy constraints. The privacy constraint guarantees that the user’s group remains private. This is motivated by applications where a user’s group can reveal sensitive information, such as his religious and political beliefs, health condition, or race. The challenge is that the user’s group and value can be correlated. We propose a novel scheme, dubbed Query and Aggregate (Q&A) for PMGA. The novelty of Q&A is that it is an interactive aggregation scheme. In Q&A, each user is assigned a random query matrix, to which he sends the server an answer based on his group and value. We characterize the Q&A scheme’s performance in terms of accuracy (MSE), privacy, and communication. We compare Q&A to the Randomized Group (RG) scheme, which is non-interactive and adapts existing randomized response schemes to the PMGA setting. We observe that typically Q&A outperforms RG, in terms of privacy vs. utility, in the high privacy regime.
Databáze: OpenAIRE