Autor: |
Rönnqvist, Samuel, Kanerva, Jenna, Salakoski, Tapio, Ginter, Filip |
Rok vydání: |
2019 |
Předmět: |
|
Zdroj: |
In proceedings of the First NLPL Workshop on Deep Learning for Natural Language Processing (2019) |
Druh dokumentu: |
Working Paper |
Popis: |
The multilingual BERT model is trained on 104 languages and meant to serve as a universal language model and tool for encoding sentences. We explore how well the model performs on several languages across several tasks: a diagnostic classification probing the embeddings for a particular syntactic property, a cloze task testing the language modelling ability to fill in gaps in a sentence, and a natural language generation task testing for the ability to produce coherent text fitting a given context. We find that the currently available multilingual BERT model is clearly inferior to the monolingual counterparts, and cannot in many cases serve as a substitute for a well-trained monolingual model. We find that the English and German models perform well at generation, whereas the multilingual model is lacking, in particular, for Nordic languages. |
Databáze: |
arXiv |
Externí odkaz: |
|