On the efficiency of data collection for crowdsourced classification

Autor:	Nicholas R. Jennings, Long Tran-Thanh, Edoardo Manino
Rok vydání:	2018
Předmět:	Data collection Process (engineering) Computer science media_common.quotation_subject Aggregate (data warehouse) Sampling (statistics) 020207 software engineering 02 engineering and technology computer.software_genre Variable (computer science) Empirical research 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Quality (business) Data mining Representation (mathematics) computer media_common
Zdroj:	Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18) Scopus-Elsevier Web of Science IJCAI
Popis:	The quality of crowdsourced data is often highly variable. For this reason, it is common to collect redundant data and use statistical methods to aggregate it. Empirical studies show that the policies we use to collect such data have a strong impact on the accuracy of the system. However, there is little theoretical understanding of this phenomenon. In this paper we provide the first theoretical explanation of the accuracy gap between the most popular collection policies: the non-adaptive uniform allocation, and the adaptive uncertainty sampling and information gain maximisation. To do so, we propose a novel representation of the collection process in terms of random walks. Then, we use this tool to derive lower and upper bounds on the accuracy of the policies. With these bounds, we are able to quantify the advantage that the two adaptive policies have over the non-adaptive one for the first time.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::22dfeb57619b1880f847f397f915e9fa http://hdl.handle.net/10044/1/64569 Zobrazit plný text záznamu