Automatic restoration of diacritics based on word n-grams for Slovak texts

Autor: Matej Mesko, Michal Duracik, Patrik Hrkut, Stefan Toth, Emanuel Zaymus
Rok vydání: 2019
Předmět:
Zdroj: 2019 IEEE 15th International Scientific Conference on Informatics.
DOI: 10.1109/informatics47936.2019.9119328
Popis: In the past and even now, many people still write texts without diacritics, especially in chat messages, e-mails or discussion posts. This issue evolved from historical reasons when people had a problem with text encoding in messages or wanted to write them faster. In this paper, we propose an algorithm based on word n-grams (contiguous sequence of n words) that restore diacritics of text written in the Slovak language. We also compare and evaluate our results with existing algorithms developed for Slovak texts.
Databáze: OpenAIRE