ReaderBench: Multilevel analysis of Russian text characteristics
Autor: | Dragos Corlatescu, Ștefan Ruseti, Mihai Dascalu |
---|---|
Jazyk: | English<br />Russian |
Rok vydání: | 2022 |
Předmět: | |
Zdroj: | Russian Journal of Linguistics, Vol 26, Iss 2, Pp 342-370 (2022) |
Druh dokumentu: | article |
ISSN: | 2687-0088 2686-8024 |
DOI: | 10.22363/2687-0088-30145 |
Popis: | This paper introduces an adaptation of the open source ReaderBench framework that now supports Russian multilevel analyses of text characteristics, while integrating both textual complexity indices and state-of-the-art language models, namely Bidirectional Encoder Representations from Transformers (BERT). The evaluation of the proposed processing pipeline was conducted on a dataset containing Russian texts from two language levels for foreign learners (A - Basic user and B - Independent user). Our experiments showed that the ReaderBench complexity indices are statistically significant in differentiating between the two classes of language level, both from: a) a statistical perspective, where a Kruskal-Wallis analysis was performed and features such as the “nmod” dependency tag or the number of nouns at the sentence level proved the be the most predictive; and b) a neural network perspective, where our model combining textual complexity indices and contextualized embeddings obtained an accuracy of 92.36% in a leave one text out cross-validation, outperforming the BERT baseline. ReaderBench can be employed by designers and developers of educational materials to evaluate and rank materials based on their difficulty, as well as by a larger audience for assessing text complexity in different domains, including law, science, or politics. |
Databáze: | Directory of Open Access Journals |
Externí odkaz: |