BPTree
Autor: | Jelani Nelson, David P. Woodruff, Vladimir Braverman, Nikita Ivkin, Stephen R. Chestnut, Zhengyu Wang |
---|---|
Rok vydání: | 2017 |
Předmět: |
Data stream mining
0102 computer and information sciences 02 engineering and technology Binary logarithm 01 natural sciences Infimum and supremum 010201 computation theory & mathematics Bounding overwatch 020204 information systems Chaining 0202 electrical engineering electronic engineering information engineering Bernoulli process Constant (mathematics) Algorithm Independence (probability theory) Mathematics |
Zdroj: | PODS |
Popis: | The task of finding heavy hitters is one of the best known and well studied problems in the area of data streams. One is given a list i1,i2,...,im∈[n] and the goal is to identify the items among [n] that appear frequently in the list. In sub-polynomial space, the strongest guarantee available is the l2 guarantee, which requires finding all items that occur at least e||ƒ||2 times in the stream, where the vector ƒ∈Rn is the count histogram of the stream with ith coordinate equal to the number of times i appears ƒi:=#{je[m]:ij=i. The first algorithm to achieve the l2 guarantee was the CountSketch of [11], which requires O(e-2log n) words of memory and O(log n) update time and is known to be space-optimal if the stream allows for deletions. The recent work of [7] gave an improved algorithm for insertion-only streams, using only O(e-2loge-1log log n) words of memory. In this work, we give an algorithm BPTree for l2 heavy hitters in insertion-only streams that achieves O(e-2loge-1) words of memory and O(loge-1) update time, which is the optimal dependence on n and m. In addition, we describe an algorithm for tracking ||ƒ||2 at all times with O(e-2) memory and update time. Our analyses rely on bounding the expected supremum of a Bernoulli process involving Rademachers with limited independence, which we accomplish via a Dudley-like chaining argument that may have applications elsewhere. |
Databáze: | OpenAIRE |
Externí odkaz: |