Towards Inducing Document-Level Abilities in Standard Multilingual Neural Machine Translation Models

Autor:	Gumma, Varun, Chitale, Pranjal A., Bali, Kalika
Rok vydání:	2024
Předmět:	Computer Science - Computation and Language
Druh dokumentu:	Working Paper
Popis:	Neural Machine Translation (NMT) models have traditionally used Sinusoidal Positional Embeddings (PEs), which often struggle to capture long-range dependencies and are less efficient for handling extended context or document-level translation tasks. This work addresses the challenge of transitioning pre-trained NMT models from absolute sinusoidal PEs to relative PEs, such as Rotary Positional Embeddings (ROPE) and Attention with Linear Biases (ALIBI), without compromising performance. We demonstrate that parameter-efficient fine-tuning, using only a small amount of high-quality data, can successfully facilitate this transition. Experimental results indicate that switching from sinusoidal to relative PEs results in competitive translation quality on sentence-level evaluation benchmarks. Additionally, models trained with ROPE consistently outperform those using ALIBI and Sinusoidal PEs on document-level benchmarks across both string-based metrics and qualitative evaluations. Moreover, we find that a small amount of long-context data in a few languages is sufficient for cross-lingual length generalization, thereby inducing long-context capabilities. Comment: Under Review
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2408.11382 Zobrazit plný text záznamu View this record from Arxiv