On Power-Law Distributed Balls in Bins and its Applications to View Size Estimation
Autor: | Ioannis Atsonios, Olivier Beaumont, Yusik Kim, Nicolas Hanusse |
---|---|
Přispěvatelé: | Algorithmics for computationally intensive applications over wide scale distributed platforms (CEPAGE), Université Sciences et Technologies - Bordeaux 1 (UB)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS), Machine Learning and Optimisation (TAO), Laboratoire de Recherche en Informatique (LRI), Université Paris-Sud - Paris 11 (UP11)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Université Paris-Sud - Paris 11 (UP11)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS), Université Sciences et Technologies - Bordeaux 1-Inria Bordeaux - Sud-Ouest, Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB), Centre National de la Recherche Scientifique (CNRS)-Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université Paris-Sud - Paris 11 (UP11)-Laboratoire de Recherche en Informatique (LRI), Université Paris-Sud - Paris 11 (UP11)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-CentraleSupélec |
Jazyk: | angličtina |
Rok vydání: | 2011 |
Předmět: |
Discrete mathematics
0102 computer and information sciences 02 engineering and technology Expected value 01 natural sciences Upper and lower bounds Power law Stochastic ordering Combinatorics Constant factor symbols.namesake Power Law 010201 computation theory & mathematics 020204 information systems 0202 electrical engineering electronic engineering information engineering Ball (bearing) symbols Balls into Bins Pareto distribution View Size Estimation [INFO.INFO-DC]Computer Science [cs]/Distributed Parallel and Cluster Computing [cs.DC] Coupon collector's problem Mathematics |
Zdroj: | ISAAC ISAAC, Dec 2011, Yokohama, Japan Algorithms and Computation ISBN: 9783642255908 |
Popis: | International audience; The view size estimation plays an important role in query optimization. It has been observed that many data follow a power law distribution. In this paper, we consider the balls in bins problem where we place balls into $N$ bins when the bin selection probabilities follow a power law distribution. As a generalization to the coupon collector's problem, we address the problem of determining the expected number of balls that need to be thrown in order to have at least one ball in each of the $N$ bins. We prove that $\Theta(\frac{N^\alpha \ln N}{c_N^{\alpha}})$ balls are needed to achieve this where $\alpha$ is the parameter of the power law distribution and $c_N^{\alpha}=\frac{\alpha-1}{\alpha-N^{\alpha-1}}$ for $\alpha \neq 1$ and $c_N^{\alpha}=\frac{1}{\ln N}$ for $\alpha=1$. Next, when fixing the number of balls that are thrown to $T$, we provide closed form upper and lower bounds on the expected number of bins that have at least one occupant. For $n$ large and $\alpha>1$, we prove that our bounds are tight up to a constant factor of $\left(\frac{\alpha}{\alpha-1}\right)^{1-\frac{1}{\alpha}} \leq e^{1/e} \simeq 1.4$. |
Databáze: | OpenAIRE |
Externí odkaz: |