Is Multilingual BERT Fluent in Language Generation?

Autor: Rönnqvist, Samuel, Kanerva, Jenna, Salakoski, Tapio, Ginter, Filip
Rok vydání: 2019
Předmět:
Zdroj: In proceedings of the First NLPL Workshop on Deep Learning for Natural Language Processing (2019)
Druh dokumentu: Working Paper
Popis: The multilingual BERT model is trained on 104 languages and meant to serve as a universal language model and tool for encoding sentences. We explore how well the model performs on several languages across several tasks: a diagnostic classification probing the embeddings for a particular syntactic property, a cloze task testing the language modelling ability to fill in gaps in a sentence, and a natural language generation task testing for the ability to produce coherent text fitting a given context. We find that the currently available multilingual BERT model is clearly inferior to the monolingual counterparts, and cannot in many cases serve as a substitute for a well-trained monolingual model. We find that the English and German models perform well at generation, whereas the multilingual model is lacking, in particular, for Nordic languages.
Databáze: arXiv