Natural Questions: A Benchmark for Question Answering Research

Autor:	Kwiatkowski, Tom, Palomaki, Jennimaria, Redfield, Olivia, Collins, Michael, Parikh, Ankur, Alberti, Chris, Epstein, Danielle, Polosukhin, Illia, Devlin, Jacob, Lee, Kenton, Toutanova, Kristina, Jones, Llion, Kelcey, Matthew, Chang, Ming-Wei, Dai, Andrew M., Uszkoreit, Jakob, Le, Quoc, Petrov, Slav
Jazyk:	angličtina
Rok vydání:	2019
Předmět:	Computational linguistics. Natural language processing P98-98.5
Zdroj:	Transactions of the Association for Computational Linguistics, Vol 7, Pp 453-466 (2019)
Druh dokumentu:	article
ISSN:	2307-387X
DOI:	10.1162/tacl_a_00276
Popis:	We present the Natural Questions corpus, a question answering data set. Questions consist of real anonymized, aggregated queries issued to the Google search engine. An annotator is presented with a question along with a Wikipedia page from the top 5 search results, and annotates a long answer (typically a paragraph) and a short answer (one or more entities) if present on the page, or marks null if no long/short answer is present. The public release consists of 307,373 training examples with single annotations; 7,830 examples with 5-way annotations for development data; and a further 7,842 examples with 5-way annotated sequestered as test data. We present experiments validating quality of the data. We also describe analysis of 25-way annotations on 302 examples, giving insights into human variability on the annotation task. We introduce robust metrics for the purposes of evaluating question answering systems; demonstrate high human upper bounds on these metrics; and establish baseline results using competitive methods drawn from related literature.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/8650fdc04d7944c4893d0b995b6de6f7 Zobrazit plný text záznamu View record in DOAJ