In Neural Machine Translation, What Does Transfer Learning Transfer?
Autor: | Alham Fikri Aji, Rico Sennrich, Nikolay Bogoychev, Kenneth Heafield |
---|---|
Přispěvatelé: | University of Zurich |
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
Information transfer
Theoretical computer science Machine translation Computer science 410 Linguistics 02 engineering and technology 010501 environmental sciences 000 Computer science knowledge & systems computer.software_genre 01 natural sciences 10105 Institute of Computational Linguistics 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Transfer of learning computer 0105 earth and related environmental sciences Transformer (machine learning model) |
Zdroj: | ACL Aji, A F, Bogoychev, N, Heafield, K & Sennrich, R 2020, In Neural Machine Translation, What Does Transfer Learning Transfer? in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics . pp. 7701–7710, 2020 Annual Conference of the Association for Computational Linguistics, Virtual conference, Washington, United States, 5/07/20 . https://doi.org/10.18653/v1/2020.acl-main.688 |
Popis: | Transfer learning improves quality for low-resource machine translation, but it is unclear what exactly it transfers. We perform several ablation studies that limit information transfer, then measure the quality impact across three language pairs to gain a black-box understanding of transfer learning. Word embeddings play an important role in transfer learning, particularly if they are properly aligned. Although transfer learning can be performed without embeddings, results are sub-optimal. In contrast, transferring only the embeddings but nothing else yields catastrophic results. We then investigate diagonal alignments with auto-encoders over real languages and randomly generated sequences, finding even randomly generated sequences as parents yield noticeable but smaller gains. Finally, transfer learning can eliminate the need for a warm-up phase when training transformer models in high resource language pairs. |
Databáze: | OpenAIRE |
Externí odkaz: |