Autor:	Mansurov, B., Mansurov, A.
Rok vydání:	2021
Předmět:	Computer Science - Computation and Language
Druh dokumentu:	Working Paper
Popis:	Pretrained language models based on the Transformer architecture have achieved state-of-the-art results in various natural language processing tasks such as part-of-speech tagging, named entity recognition, and question answering. However, no such monolingual model for the Uzbek language is publicly available. In this paper, we introduce UzBERT, a pretrained Uzbek language model based on the BERT architecture. Our model greatly outperforms multilingual BERT on masked language model accuracy. We make the model publicly available under the MIT open-source license. Comment: 9 pages, 1 table
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2108.09814 Zobrazit plný text záznamu View this record from Arxiv