The First Wikipedia Questions and Factoid Answers Corpus in the Thai Language

Autor: Santipong Thaiprayoon, Anocha Rugchatjaroen, Kanokorn Trakultaweekoon, Pornpimon Palingoon
Rok vydání: 2019
Předmět:
Zdroj: 2019 14th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP).
DOI: 10.1109/isai-nlp48611.2019.9045143
Popis: This article introduces a Thai questions-answers corpus for a question-answering task which was extracted from Thai Wikipedia which was downloaded on 17 December 2017. The answers comprise 5,000 annotated factoids. The corresponding questions are exact phrases/sentences that contain the answer, but are replaced by a question word, or synthetic questions acquired from phrases and/or sentences on the wiki page. A question must contain only one of a set of 7 specific question words and a complex question must be avoided. Fifteen annotators used an annotation system specifically designed for this task. Acceptance, rejection, and revision processes were monitored by a language specialist. The final set was divided into 4,000 pairs for a training set and 1,000 pairs for a validation set. A baseline evaluation was conducted and an F1 score of 27.25 was obtained from document readers and 71.24 from document retrievals.
Databáze: OpenAIRE