A theoretical and empirical study of affirmative sampling

Autor: Montes Sanabria, Jordi
Přispěvatelé: Martínez Parra, Conrado, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació
Jazyk: angličtina
Rok vydání: 2020
Předmět:
Zdroj: UPCommons. Portal del coneixement obert de la UPC
Universitat Politècnica de Catalunya (UPC)
Popis: Distinct random sampling is widely used in many applications due to its ability to answer aggregate queries on data with provable probabilistic guarantees on the quality of the answer. We consider the data stream model where a single pass over the data is allowed and only a small number of elements in the stream can be stored. We analyze Affirmative Sampling, an algorithm for distinct random sampling with adaptive sample size. We discuss an unbiased cardinality estimator of the stream using the elements in the random sample from Affirmative sampling and provide bounds for its accuracy. Random samples can be used for much more than just cardinality estimation. We introduce Nooh, a fast genome distance estimation tool that uses Affirmative sampling to compete with the current state-of-the-art tools in bioinformatics.
Databáze: OpenAIRE