Popis: |
In recent years, an increasing amount of data is collected in different and often, not cooperative, databases. The problem of privacy-preserving, distributed calculations over separated databases and, a relative to it, issue of private data release were intensively investigated. However, despite a considerable progress, computational complexity, due to an increasing size of data, remains a limiting factor in real-world deployments, especially in case of privacy-preserving computations. In this paper, we present a general method for trade off between performance and accuracy of distributed calculations by performing data sampling. Sampling was a topic of extensive research that recently received a boost of interest. We provide a sampling method targeted at separate, non-collaborating, vertically partitioned datasets. The method is exemplified and tested on approximation of intersection set both without and with privacy-preserving mechanism. An analysis of the bound on error as a function of the sample size is discussed and heuristic algorithm is suggested to further improve the performance. The algorithms were implemented and experimental results confirm the validity of the approach. |