Bilingual Markov Reordering Labels for Hierarchical SMT
Autor: | Maillette de Buij Wenniger, G., Sima'an, K., Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. |
---|---|
Přispěvatelé: | Language and Computation (ILLC, FNWI/FGw), ILLC (FNWI), Brain and Cognition |
Jazyk: | angličtina |
Rok vydání: | 2014 |
Zdroj: | Proceedings of SSST-8 : Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation: EMNLP 2014/SIGMT/SIGLEX Workshop : 25 October, 2014, Doha, Qatar, 11-21 STARTPAGE=11;ENDPAGE=21;TITLE=Proceedings of SSST-8 : Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation |
Popis: | Earlier work on labeling Hiero grammars with monolingual syntax reports improved performance, suggesting that such labeling may impact phrase reordering as well as lexical selection. In this paper we explore the idea of inducing bilingual labels for Hiero grammars without using any additional resources other than original Hiero itself does. Our bilingual labels aim at capturing salient patterns of phrase reordering in the training parallel corpus. These bilingual labels originate from hierarchical factorizations of the word alignments in Hiero’s own training data. In this paper we take a Markovian view on synchronous top-down derivations over these factorizations which allows us to extract 0th- and 1st-order bilingual reordering labels. Using exactly the same training data as Hiero we show that the Markovian interpretation of word alignment factorization offers major benefits over the unlabeled version. We report extensive experiments with strict and soft bilingual labeled Hiero showing improved performance up to 1 BLEU points for Chinese-English and about 0.1 BLEU points for German-English. |
Databáze: | OpenAIRE |
Externí odkaz: |