EduQG: A Multi-format Multiple Choice Dataset for the Educational Domain

Autor:	Hadifar, Amir, Bitew, Semere Kiros, Deleu, Johannes, Develder, Chris, Demeester, Thomas
Rok vydání:	2022
Předmět:	Computer Science - Computation and Language
Druh dokumentu:	Working Paper
Popis:	We introduce a high-quality dataset that contains 3,397 samples comprising (i) multiple choice questions, (ii) answers (including distractors), and (iii) their source documents, from the educational domain. Each question is phrased in two forms, normal and close. Correct answers are linked to source documents with sentence-level annotations. Thus, our versatile dataset can be used for both question and distractor generation, as well as to explore new challenges such as question format conversion. Furthermore, 903 questions are accompanied by their cognitive complexity level as per Bloom's taxonomy. All questions have been generated by educational experts rather than crowd workers to ensure they are maintaining educational and learning standards. Our analysis and experiments suggest distinguishable differences between our dataset and commonly used ones for question generation for educational purposes. We believe this new dataset can serve as a valuable resource for research and evaluation in the educational domain. The dataset and baselines will be released to support further research in question generation.
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2210.06104 Zobrazit plný text záznamu View this record from Arxiv