Drawing Sound Conclusions from Noisy Judgments

Autor:	Wei Min, Xiao Wang, David E. Goldberg, Zongru Wan, Andrew Trotman
Rok vydání:	2017
Předmět:	Computer science business.industry media_common.quotation_subject 02 engineering and technology Machine learning computer.software_genre Crowdsourcing Search engine Annotation 020204 information systems Metric (mathematics) 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Quality (business) Relevance (information retrieval) Artificial intelligence business computer media_common
Zdroj:	WWW
Popis:	The quality of a search engine is typically evaluated using hand-labeled data sets, where the labels indicate the relevance of documents to queries. Often the number of labels needed is too large to be created by the best annotators, and so less accurate labels (e.g. from crowdsourcing) must be used. This introduces errors in the labels, and thus errors in standard precision metrics (such as P@k and DCG); the lower the quality of the judge, the more errorful the labels, consequently the more inaccurate the metric. We introduce equations and algorithms that can adjust the metrics to the values they would have had if there were no annotation errors. This is especially important when two search engines are compared by comparing their metrics. We give examples where one engine appeared to be statistically significantly better than the other, but the effect disappeared after the metrics were corrected for annotation error. In other words the evidence supporting a statistical difference was illusory, and caused by a failure to account for annotation error.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::5da51ced9547e6b0397f565f9c0fc0a1 https://doi.org/10.1145/3038912.3052570 Zobrazit plný text záznamu