Textometr: an online tool for automated complexity level assessment of texts for Russian language learners
Autor: | Maria Yu. Lebedeva, Antonina Laposhina |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2021 |
Předmět: |
Linguistics and Language
Vocabulary computational linguodidactics Computer science Process (engineering) media_common.quotation_subject Foreign language web tools computer.software_genre russian as a foreign language Language and Linguistics text adapting Education Task (project management) computer assisted language learning reading Reading (process) media_common PG1-9665 business.industry russian language learning Test (assessment) Word lists by frequency text complexity Scale (social sciences) Artificial intelligence business educational text computer Slavic languages. Baltic languages. Albanian languages Natural language processing |
Zdroj: | Russian Language Studies, Vol 19, Iss 3, Pp 331-345 (2021) |
ISSN: | 2618-8171 2618-8163 |
Popis: | Evaluation of text accessibility seems to be an extremely urgent and labor-consuming task in the process of preparing texts for teaching Russian as a foreign language. On the other hand, the procedure of assigning a text to one of the levels on the CEFR scale (from A1 to C2) is well-formalized and described in the professional literature, which opens opportunities for its automation. This paper presents Textometr - a new free web-based tool for estimating CEFR level and other key statistics from any given text in Russian that can be relevant for adapting it for foreign students. The automated assessment of the text level here is based on a regression model, trained on the dataset of more than 800 texts from Russian textbooks for foreigners, applying several machine learning and natural language processing methods. In addition to the CEFR level, the tool provides information relevant for adapting the text to educational tasks: lists of keywords and words for a potential vocabulary list, statistics on the text coverage by frequency lists and CEFR-graded vocabulary lists (lexical minima), a frequency list of the text, a forecast of the time needed for reading. The tool shortages at the current stage of development and suggested ways to solve them are also discussed. Finally, the results of the test on the tool quality and the vectors for its further development are reported. Textometr can provide helpful information not only to teachers and guidance teachers, but to authors of textbooks and publishers to check the compliance of the text content with the declared level and educational goals. |
Databáze: | OpenAIRE |
Externí odkaz: |