Statistical Machine Translation (SMT) for Highly-Inflectional Scarce-Resource Language
Autor: | Saman Namdar, Hesham Faili, Shahram Khadivi |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2013 |
Předmět: | |
Zdroj: | International Journal of Information and Communication Technology Research, Vol 5, Iss 1, Pp 39-52 (2013) |
Druh dokumentu: | article |
ISSN: | 2251-6107 2783-4425 |
Popis: | Statistical Machine Translation (SMT) is a machine translation paradigm, in which translations are generated on the base of statistical models. In this system, parameters are derived from an analysis of a parallel corpus, and SMT quality depends on the ability of learning word translations. Enriching the SMT by a suitable morphology analyser decreases out of vocabulary words and dictionary size dramatically. This could be more considerable when it deals with a highly-inflectional, low-resource, language like Persian. Defining a suitable granularity for word segment may improve the alignment quality in the parallel corpus. In this paper different schemes and word’s combinations segments in a SMT’s experiment from Persian to English language are prospected and the best one-to-one alignment, which is called En-like scheme, is proposed. By using the mentioned scheme the translation’s quality from Persian to English is improved about 3 points with respect to BLEU measure over the phrase-based SMT. |
Databáze: | Directory of Open Access Journals |
Externí odkaz: |