The First Wikipedia Questions and Factoid Answers Corpus in the Thai Language
Autor: | Santipong Thaiprayoon, Anocha Rugchatjaroen, Kanokorn Trakultaweekoon, Pornpimon Palingoon |
---|---|
Rok vydání: | 2019 |
Předmět: |
Interrogative word
Computer science business.industry Factoid Complex question 02 engineering and technology computer.software_genre Task (project management) Set (abstract data type) 03 medical and health sciences Annotation 0302 clinical medicine 030221 ophthalmology & optometry 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Artificial intelligence Baseline (configuration management) F1 score business computer Natural language processing |
Zdroj: | 2019 14th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP). |
DOI: | 10.1109/isai-nlp48611.2019.9045143 |
Popis: | This article introduces a Thai questions-answers corpus for a question-answering task which was extracted from Thai Wikipedia which was downloaded on 17 December 2017. The answers comprise 5,000 annotated factoids. The corresponding questions are exact phrases/sentences that contain the answer, but are replaced by a question word, or synthetic questions acquired from phrases and/or sentences on the wiki page. A question must contain only one of a set of 7 specific question words and a complex question must be avoided. Fifteen annotators used an annotation system specifically designed for this task. Acceptance, rejection, and revision processes were monitored by a language specialist. The final set was divided into 4,000 pairs for a training set and 1,000 pairs for a validation set. A baseline evaluation was conducted and an F1 score of 27.25 was obtained from document readers and 71.24 from document retrievals. |
Databáze: | OpenAIRE |
Externí odkaz: |