The Bloom Filter Design for Numerical Range Query

Autor: Kun-Yang Fan, 范坤揚
Rok vydání: 2008
Druh dokumentu: 學位論文 ; thesis
Popis: 96
A Bloom filter is a simple space-efficient randomized data structure for concisely representing a data set. The property of its randomization has great potential for distributed network systems, and it supports the membership query with a small false positive rate, which is the probability that an element was not in the data set but Bloom filter reported it is. There have been many studies on how to improve the correctness of Bloom filter by reducing the false positive rate. However, little research has been done on Bloom filter design for numerical range query. Since a Bloom filter can only represent a limited number of elements, when a large range of numerical attributes are inserted into a Bloom filter, the false positive rate increases dramatically. In this thesis we present efficient Bloom filter design for numerical ranges. First, Division scheme reduces the number of elements inserted by grouping the numerical range into divisions, i.e., numbers in the same division are treated as the same element. On the other hand, Overlapping scheme reduced the number of bits inserted in the Bloom filter by overlapping the inserted bits of consecutive numbers. In addition, Division and Overlapping scheme combines the techniques of the aforementioned two schemes. Analytic model was used to derive the false positive rates of the schemes. Computer simulations were used to verify the correctness of the analytic model. Moreover, the optimal configuration of Bloom filter representing a numeric range of single attribute can be obtained, i.e., the false positive rate is minimized. A heuristic algorithm has been developed to obtain near optimal configurations for multiple attributes. The Division and Overlapping scheme extends the Bloom filter design for numerical range query, where traditional Bloom filter cannot be used.
Databáze: Networked Digital Library of Theses & Dissertations