Trade-offs Between Query Difficulty and Sample Complexity in Crowdsourced Data Acquisition

Autor:	Hye Won Chung, Doyeon Kim, Alfred O. Hero, Ji Oon Lee
Rok vydání:	2018
Předmět:	021103 operations research Sample complexity Trade offs 0211 other engineering and technologies Binary number 020206 networking & telecommunications 02 engineering and technology Combinatorics Data acquisition Average size Fountain code 0202 electrical engineering electronic engineering information engineering Erasure Parity (mathematics) Mathematics
Zdroj:	Allerton
DOI:	10.1109/allerton.2018.8636012
Popis:	Consider a crowdsourcing system whose task is to classify $k$ objects in a database into two groups depending on the binary attributes of the objects. Here we propose a parity response model: the worker is asked to check whether the number of objects having a given attribute in the chosen subset is even or odd. A worker either responds with a correct binary answer or declines to respond. We propose a method for designing the sequence of subsets of objects to be queried so that the attributes of the objects can be identified with high probability using few (${n}$) answers. The method is based on an analogy to the design of Fountain codes for erasure channels. We define the query difficulty $\overline {d}$ as the average size of the query subsets and we define the sample complexity $n$ as the minimum number of collected answers required to attain a given recovery accuracy. We obtain fundamental tradeoffs between recovery accuracy, query difficulty, and sample complexity. In particular, the necessary and sufficient sample complexity required for recovering all $k$ attributes with high probability is $n = c_{0}\max\{k, (k\,\log\, k)/\overline {d}\}$ and the sample complexity for recovering a fixed proportion $(1-\delta )k$ of the attributes for $\delta =o(1)$ is $n=c_{1} \max \{k, (\mathrm {k}\log (1/\delta ))/\overline {d}\}$, where $c_{0},\, c_{1} >0.$
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::5ed0626fab5af47c7446009b3bf5621d https://doi.org/10.1109/allerton.2018.8636012 Zobrazit plný text záznamu