Exploiting Large Unlabeled Data in Automatic Evaluation of Coherence in Czech

Autor: Kateřina Rysová, Michal Novák, Jiří Mírovský, Magdaléna Rysová
Rok vydání: 2019
Předmět:
Zdroj: Text, Speech, and Dialogue ISBN: 9783030279462
TSD
DOI: 10.1007/978-3-030-27947-9_17
Popis: The paper contributes to the research on automatic evaluation of surface coherence in student essays. We look into possibilities of using large unlabeled data to improve quality of such evaluation. Particularly, we propose two approaches to benefit from the large data: (i) n-gram language model, and (ii) density estimates of features used by the evaluation system. In our experiments, we integrate these approaches that exploit data from the Czech National Corpus into the evaluator of surface coherence for Czech, the EVALD system, and test its performance on two datasets: essays written by native speakers (L1) as well as foreign learners of Czech (L2). The system implementing these approaches together with other new features significantly outperforms the original EVALD system, especially on L1 with a large margin.
Databáze: OpenAIRE