Constructing a Turkish-English parallel treebank
Autor: | Razieh Ehsani, Ercan Solak, Olcay Taner Yıldız, Onur Görgün |
---|---|
Přispěvatelé: | Işık Üniversitesi, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü, Işık University, Faculty of Engineering, Department of Computer Engineering, Yıldız, Olcay Taner, Solak, Ercan, Görgün, Onur, Ehsani, Razieh |
Předmět: |
Translation
Statistical machine translation Machine translation Computer science Turkish media_common.quotation_subject Treebank Computational linguistics computer.software_genre Task (project management) Trees Set (abstract data type) media_common English sentences business.industry Forestry Punctuation language.human_language In-buildings Tree (data structure) language ComputingMethodologies_DOCUMENTANDTEXTPROCESSING Artificial intelligence business computer Natural language processing Treebanks |
Zdroj: | Scopus-Elsevier ACL (2) |
Popis: | In this paper, we report our preliminary efforts in building an English-Turkish parallel treebank corpus for statistical machine translation. In the corpus, we manually generated parallel trees for about 5,000 sentences from Penn Treebank. English sentences in our set have a maximum of 15 tokens, including punctuation. We constrained the translated trees to the reordering of the children and the replacement of the leaf nodes with appropriate glosses. We also report the tools that we built and used in our tree translation task. Publisher's Version |
Databáze: | OpenAIRE |
Externí odkaz: |