Investigating Methods for Weighted Reservoir Sampling with Replacement

Autor: Meligrana, Adriano
Rok vydání: 2024
Předmět:
Druh dokumentu: Working Paper
Popis: Reservoir sampling techniques can be used to extract a sample from a population of unknown size, where units are observed sequentially. Most of attention has been placed to sampling without replacement, with only a small number of studies focusing on sampling with replacement. In this paper, we clarify some statements appearing in the literature about the reduction of reservoir sampling with replacement to single reservoir sampling without replacement, exploring in detail how to deal with the weighted case. Then, we demonstrate that the results shown in Park et al. (2004) can be further generalized to develop a skip-based algorithm more efficient than previous methods, and, additionally, we provide a single-pass merging strategy which can be executed on multiple streams in parallel. Finally, we establish that the skip-based algorithm is faster than standard methods when used to extract a single sample from the population in a non-streaming scenario when the sample ratio is approximately less than 10% of the population.
Databáze: arXiv