Multilingual Non-Factoid Question Answering with Silver Answers

Autor:	Mishra, Ritwik, Vennam, Sreeram, Shah, Rajiv Ratn, Kumaraguru, Ponnurangam
Rok vydání:	2024
Předmět:	Computer Science - Computation and Language Computer Science - Artificial Intelligence Computer Science - Information Retrieval Computer Science - Machine Learning
Druh dokumentu:	Working Paper
Popis:	Most existing Question Answering Datasets (QuADs) primarily focus on factoid-based short-context Question Answering (QA) in high-resource languages. However, the scope of such datasets for low-resource languages remains limited, with only a few works centered on factoid-based QuADs and none on non-factoid QuADs. Therefore, this work presents MuNfQuAD, a multilingual QuAD with non-factoid questions. It utilizes interrogative sub-headings from BBC news articles as questions and the corresponding paragraphs as silver answers. The dataset comprises over 370K QA pairs across 38 languages, encompassing several low-resource languages, and stands as the largest multilingual QA dataset to date. Based on the manual annotations of 790 QA-pairs from MuNfQuAD (golden set), we observe that 98\% of questions can be answered using their corresponding silver answer. Our fine-tuned Answer Paragraph Selection (APS) model outperforms the baselines. The APS model attained an accuracy of 80\% and 72\%, as well as a macro F1 of 72\% and 66\%, on the MuNfQuAD testset and the golden set, respectively. Furthermore, the APS model effectively generalizes certain a language within the golden set, even after being fine-tuned on silver labels.
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2408.10604 Zobrazit plný text záznamu View this record from Arxiv