Simple, Scalable Adaptation for Neural Machine Translation

Autor:	Naveen Arivazhagan, Ankur Bapna, Orhan Firat
Rok vydání:	2019
Předmět:	FOS: Computer and information sciences Computer Science - Machine Learning Computer Science - Computation and Language Machine translation Computer science business.industry Adapter (computing) 02 engineering and technology 010501 environmental sciences Machine learning computer.software_genre Translation (geometry) 01 natural sciences Machine Learning (cs.LG) Task (project management) Simple (abstract algebra) Scalability 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Artificial intelligence Adaptation (computer science) business Computation and Language (cs.CL) computer 0105 earth and related environmental sciences
Zdroj:	EMNLP/IJCNLP (1)
Popis:	Fine-tuning pre-trained Neural Machine Translation (NMT) models is the dominant approach for adapting to new languages and domains. However, fine-tuning requires adapting and maintaining a separate model for each target task. We propose a simple yet efficient approach for adaptation in NMT. Our proposed approach consists of injecting tiny task specific adapter layers into a pre-trained model. These lightweight adapters, with just a small fraction of the original model size, adapt the model to multiple individual tasks simultaneously. We evaluate our approach on two tasks: (i) Domain Adaptation and (ii) Massively Multilingual NMT. Experiments on domain adaptation demonstrate that our proposed approach is on par with full fine-tuning on various domains, dataset sizes and model capacities. On a massively multilingual dataset of 103 languages, our adaptation approach bridges the gap between individual bilingual models and one massively multilingual model for most language pairs, paving the way towards universal machine translation. EMNLP 2019
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::df1737366650f25e00a4dfdce01f928e https://doi.org/10.18653/v1/d19-1165 Zobrazit plný text záznamu