SAQE
Autor: | Yongjoo Park, Xi He, Jennie Rogers, Xiao Wang, Johes Bater |
---|---|
Rok vydání: | 2020 |
Předmět: |
Computer science
05 social sciences General Engineering 050801 communication & media studies 010501 environmental sciences Cryptographic protocol computer.software_genre 01 natural sciences Pipeline (software) 0508 media and communications Secure multi-party computation Key (cryptography) Overhead (computing) Differential privacy Data mining Raw data Private information retrieval computer 0105 earth and related environmental sciences |
Zdroj: | Proceedings of the VLDB Endowment. 13:2691-2705 |
ISSN: | 2150-8097 |
Popis: | A private data federation enables clients to query the union of data from multiple data providers without revealing any extra private information to the client or any other data providers. Unfortunately, this strong end-to-end privacy guarantee requires cryptographic protocols that incur a significant performance overhead as high as 1,000 x compared to executing the same query in the clear. As a result, private data federations are impractical for common database workloads. This gap reveals the following key challenge in a private data federation: offering significantly fast and accurate query answers without compromising strong end-to-end privacy. To address this challenge, we propose SAQE, the Secure Approximate Query Evaluator, a private data federation system that scales to very large datasets by combining three techniques --- differential privacy, secure computation, and approximate query processing --- in a novel and principled way. First, SAQE adds novel secure sampling algorithms into the federation's query processing pipeline to speed up query workloads and to minimize the noise the system must inject into the query results to protect the privacy of the data. Second, we introduce a query planner that jointly optimizes the noise introduced by differential privacy with the sampling rates and resulting error bounds owing to approximate query processing. Our research shows that these three techniques are synergistic: sampling within certain accuracy bounds improves both query privacy and performance, meaning that SAQE executes over less data than existing techniques without sacrificing efficiency, privacy, or accuracy. Using our optimizer, we leverage this counter-intuitive result to identify an inflection point that maximizes all three criteria prior query evaluation. Experimentally, we show that this result enables SAQE to trade-off among these three criteria to scale its query processing to very large datasets with accuracy bounds dependent only on sample size, and not the raw data size. |
Databáze: | OpenAIRE |
Externí odkaz: |