Source-side Reordering to Improve Machine Translation between Languages with Distinct Word Orders
Autor: | Shyam Sunder Agrawal, Karunesh Arora |
---|---|
Rok vydání: | 2021 |
Předmět: | |
Zdroj: | ACM Transactions on Asian and Low-Resource Language Information Processing. 20:1-18 |
ISSN: | 2375-4702 2375-4699 |
DOI: | 10.1145/3448252 |
Popis: | English and Hindi have significantly different word orders. English follows the subject-verb-object (SVO) order, while Hindi primarily follows the subject-object-verb (SOV) order. This difference poses challenges to modeling this pair of languages for translation. In phrase-based translation systems, word reordering is governed by the language model, the phrase table, and reordering models. Reordering in such systems is generally achieved during decoding by transposing words within a defined window. These systems can handle local reorderings, and while some phrase-level reorderings are carried out during the formation of phrases, they are weak in learning long-distance reorderings. To overcome this weakness, researchers have used reordering as a step in pre-processing to render the reordered source sentence closer to the target language in terms of word order. Such approaches focus on using parts-of-speech (POS) tag sequences and reordering the syntax tree by using grammatical rules, or through head finalization. This study shows that mere head finalization is not sufficient for the reordering of sentences in the English-Hindi language pair. It describes various grammatical constructs and presents a comparative evaluation of reorderings with the original and the head-finalized representations. The impact of the reordering on the quality of translation is measured through the BLEU score in phrase-based statistical systems and neural machine translation systems. A significant gain in BLEU score was noted for reorderings in different grammatical constructs. |
Databáze: | OpenAIRE |
Externí odkaz: |