DAQAS: Deep Arabic Question Answering System based on duplicate question detection and machine reading comprehension

Autor: Hamza Alami, Abdelkader El Mahdaouy, Abdessamad Benlahbib, Noureddine En-Nahnahi, Ismail Berrada, Said El Alaoui Ouatik
Jazyk: angličtina
Rok vydání: 2023
Předmět:
Zdroj: Journal of King Saud University: Computer and Information Sciences, Vol 35, Iss 8, Pp 101709- (2023)
Druh dokumentu: article
ISSN: 1319-1578
DOI: 10.1016/j.jksuci.2023.101709
Popis: As of late, various deep learning techniques and methods have shown their superiority to feature-based and shallow learning techniques in the field of open-domain question–answering systems (OpenQAS). However, only a few works adopted these techniques to build Arabic OpenQAS that can extract exact answers from large information sources (e.g., Wikipedia). In addition, no available Arabic OpenQAS integrated a module to identify duplicate questions to accelerate response time and reduce computation cost. In this paper, we propose an Arabic OpenQAS (named DAQAS) based on deep learning methods. It consists of three components: (1) Dense Duplicate Question Detection which returns answers to questions that already have been answered; (2) Retriever based on BM25 and Query Expansion by neural text generation; and (3) Reader able to extract exact answers given a question and the retrieved passages that probably contains the answer. All components of our system integrate deep learning models, specially transformers-based techniques, which have scored state-of-the-art in different NLP fields. We performed several experiments with publicly available question answering datasets to show the effectiveness of our system. DAQAS obtained promising results and scored 21.77% Exact Match and 54.71% F1 score when using only top 5 retrieved passages.
Databáze: Directory of Open Access Journals