A Rule-Based Approach For Aligning Japanese-Spanish Sentences From A Comparable Corpora

Autor:	Ramírez, Jessica C., Matsumoto, Yuji
Rok vydání:	2012
Předmět:	Computer Science - Computation and Language Computer Science - Artificial Intelligence
Druh dokumentu:	Working Paper
Popis:	The performance of a Statistical Machine Translation System (SMT) system is proportionally directed to the quality and length of the parallel corpus it uses. However for some pair of languages there is a considerable lack of them. The long term goal is to construct a Japanese-Spanish parallel corpus to be used for SMT, whereas, there are a lack of useful Japanese-Spanish parallel Corpus. To address this problem, In this study we proposed a method for extracting Japanese-Spanish Parallel Sentences from Wikipedia using POS tagging and Rule-Based approach. The main focus of this approach is the syntactic features of both languages. Human evaluation was performed over a sample and shows promising results, in comparison with the baseline. Comment: International Journal on Natural Language Computing (IJNLC) Vol.1, No.3, October 2012
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/1211.4488 Zobrazit plný text záznamu View this record from Arxiv