AQAD: 17,000+ Arabic Questions for Machine Comprehension of Text
Autor: | Marwan Torki, Bassam Mattar, Eman Elrefai, Adel Atef, Sandra Sherif |
---|---|
Rok vydání: | 2020 |
Předmět: |
Machine translation
business.industry Computer science 020206 networking & telecommunications 02 engineering and technology computer.software_genre Comprehension Set (abstract data type) ComputingMethodologies_PATTERNRECOGNITION Reading comprehension 0202 electrical engineering electronic engineering information engineering Question answering Information system Encyclopedia 020201 artificial intelligence & image processing Electronic publishing Artificial intelligence business computer Natural language processing |
Zdroj: | AICCSA |
Popis: | Current Arabic Machine Reading for Question Answering datasets suffer from important shortcomings. The available datasets are either small-sized high-quality collections or large-sized low-quality datasets. To address the aforementioned problems we present our Arabic Question-Answer dataset (AQAD). AQAD is a new Arabic reading comprehension large-sized high-quality dataset consisting of 17,000+ questions and answers. To collect the AQAD dataset, we present a fully automated data collector. Our collector works on a set of Arabic Wikipedia articles for the extractive question answering task. The chosen articles match the articles used in the well-known Stanford Question Answering Dataset (SQuAD). We provide evaluation results on the AQAD dataset using two state-of-the-art models for machine-reading question answering problems. Namely, BERT and BIDAF models which result in 0.37 and 0.32 F-1 measure on AQAD dataset. |
Databáze: | OpenAIRE |
Externí odkaz: |