A Multi-Stage Translation Extraction Method for Unknown Terms Using Web Search Results
Autor: | Jiun-Hung Lin, 林浚弘 |
---|---|
Rok vydání: | 2006 |
Druh dokumentu: | 學位論文 ; thesis |
Popis: | 94 Recently, a few researchers have proposed several effective search-result-based term translation extraction methods to mine translations of unknown terms in queries from Web search results. However, these methods are often suffered the problems of data sparseness and indirect assocication errors while extracting translations of infrequent unknown terms. Thereforce, in this paper we present a multi-stage translation extraction method to mitigate the problems of extracting translations of infrequent unknown terms. Some valueable results in this paper are presented as follows: lIn this paper, we propose an improved Web-based term translation extraction model which can effectively improve the translation performance of previous Web-based term translation extraction methos proposed by Cheng et al. (2004). Compared with above method proposed by Cheng et al., our experimental results show that the improved Web-based term translation extraction method can effectively upturn about 15% (36%~51%) top-1 translation inclusion rate for English to Chinese (E-C) translation of unknown terms and upturn about 14% (28%~42%) top-1 translation inclusion rate for Chinese to English (C-E) translation of unknown terms. lWe firstly propose a multi-stage translation extraction method to solve the translation problem of unknown terms. Unknown terms are classified according to their linguistic features, and we use a multi-stage translation extraction method to extract translations of unknown terms belonging to different types. For example, we present a two-stage hybrid transliteration extraction method and search-result-based abbreviation translation extraction method to solve translation problems of transliterated terms and abbreviated terms. lTo further solve the problems of data sparseness and indirect assocication errors in extracting translation of infrequent unknown terms, we present an improved extraction method to utilize second-round search results which may contain more clear information and more correct translation pairs, and can be used to improve the translation peformance of infrequent unknown terms. Our experimental results show that this method can effectively improve the translation performance of unknown terms. |
Databáze: | Networked Digital Library of Theses & Dissertations |
Externí odkaz: |