Developing MCQA Framework for Basic Science Subjects using Distributed Similarity Model and Classification Based Approaches
Autor: | Dipankar Das, Sandip Sarkar, David Eduardo Pinto Avendaño, Partha Pakray |
---|---|
Rok vydání: | 2020 |
Předmět: |
0209 industrial biotechnology
business.industry Computer science 02 engineering and technology computer.software_genre Similitude 020901 industrial engineering & automation Semantic similarity 0202 electrical engineering electronic engineering information engineering Question answering 020201 artificial intelligence & image processing Artificial intelligence business computer Natural language processing Multiple choice |
Zdroj: | International Journal of Asian Language Processing. 30:2050015 |
ISSN: | 2424-791X 2717-5545 |
Popis: | In this paper, we proposed a novel approach to improve the performance of multiple choice question answering (MCQA) system using distributed semantic similarity and classification approach. We mainly focus on science-based MCQ which is really difficult to handle. Our proposed method is based on the hypothesis that the relation between question and answer of that question will be high in distributional semantic model rather than other options of that question. We are using IJCNLP shared Task 5 and SciQ dataset for our experiments. We have built three Models (i.e., Model 1, Model 2, Model 3) based on the dataset format. The basic difference between IJCNLP Task 5 and SciQ datasets is that SciQ dataset contains supporting text with questions whereas IJCNLP Task 5 dataset does not contain supporting text. Model 1 and Model 2 are mainly built to deal with IJCNLP Task 5 dataset whereas Model 3 is mainly built for SciQ dataset. Model 2 is mainly built to deal with the dependencies between options (i.e., all of these, two of them, none of them) whereas Model 1 is the basic model for MCQA and it cannot capture the dependencies between options. We also compare the result of SciQ dataset with supporting text (i.e., using Model 3) and without supporting text (i.e., using Model 1). We also compared our system with other existing methods. Though in some cases the performance of our proposed method is not satisfactory, we have noted that our submission is simple and robust that allows it to be more easily integrated into complex applications. This work investigates different techniques for choosing the correct answer of a given question in MCQA system. These experiments may therefore be useful to improve the performance of current science-based question answering (QA) systems. For IJCNLP Task 5 dataset, we achieved 44.5% using Model 2 and PubMed Dataset. Similarly for SciQ dataset we achieved 82.25% using Model 3 and PubMed dataset. |
Databáze: | OpenAIRE |
Externí odkaz: |