AQAD: 17,000+ Arabic Questions for Machine Comprehension of Text

Autor: Marwan Torki, Bassam Mattar, Eman Elrefai, Adel Atef, Sandra Sherif
Rok vydání: 2020
Předmět:
Zdroj: AICCSA
Popis: Current Arabic Machine Reading for Question Answering datasets suffer from important shortcomings. The available datasets are either small-sized high-quality collections or large-sized low-quality datasets. To address the aforementioned problems we present our Arabic Question-Answer dataset (AQAD). AQAD is a new Arabic reading comprehension large-sized high-quality dataset consisting of 17,000+ questions and answers. To collect the AQAD dataset, we present a fully automated data collector. Our collector works on a set of Arabic Wikipedia articles for the extractive question answering task. The chosen articles match the articles used in the well-known Stanford Question Answering Dataset (SQuAD). We provide evaluation results on the AQAD dataset using two state-of-the-art models for machine-reading question answering problems. Namely, BERT and BIDAF models which result in 0.37 and 0.32 F-1 measure on AQAD dataset.
Databáze: OpenAIRE