英漢專利文書文句對列與應用

Autor: 田侃文
Předmět:
Druh dokumentu: Text
Popis: 綜觀現今全球化的趨勢,世界各國皆進行跨語言的專利文書翻譯工作。在專利文書翻譯及跨語言檢索方面,蒐集大量且正確的專利文書平行語料能夠協助相關研究的進行。利用人工進行平行語料文句的對列工作相當費時,因此,本研究利用斷句、斷詞及英文詞幹還原等前處理技術,搭配中英技術名詞對應表,透過統計詞頻調整對應詞組的權重,並以句子間的餘弦相似度作為輔助,計算中英文句子間的相似度,最後利用動態規劃演算法挑選最佳的對列組合,發展出一套中英文句對列的系統。以精確率及召回率評比對列成效,並將對列後產生的句對作為輔助式機器翻譯系統詞序調動的訓練語料,以2003年國際數學語科學教育成就趨勢調查測驗試題作為翻譯對象,採用BLEU及NIST的評比方式進行評估。實驗結果顯示本系統不僅在1:1對列模式的精確率達到0.995,且利用門檻值篩選出的大量中英文句對,確實能夠提升輔助式機器翻譯系統的翻譯品質。
The importance of cross-language translation of patent documents has grown substantially as a result of globalization. Accurately aligned parallel corpora help researchers conduct their research projects that depend on bilingual data to develop techniques such as computer-aided translation and cross-language information retrieval. It takes time to collect parallel data manually; therefore, an English-Chinese sentence alignment system was built that will automatically complete this process. A variety of preprocessing techniques for natural language processing were used, such as the stemming of the English words, to build this system. Two parts of scores were considered to align sentences. The first part considered the number and weight of aligned word pairs in the Chinese and English sentences. The second part came from a special way to compute the cosine value of the Chinese and English sentence pairs. Precision and recall rates were used to evaluate the quality of the aligned results and the 1:1 alignment achieved 0.995 precision. In addition, the aligned sentences were used as training data in a machine translation for the TIMSS test items, experimental results show that the aligned sentences are helpful for the translation system.
Databáze: Networked Digital Library of Theses & Dissertations