Further Experiments in Bilingual Text Alignment
Autor: | Harold L. Somers |
---|---|
Rok vydání: | 1998 |
Předmět: |
Linguistics and Language
Corpus analysis Vocabulary business.industry Text alignment Computer science media_common.quotation_subject Speech recognition Similarity measure computer.software_genre Levenshtein distance Language and Linguistics language.human_language Parallel corpora German language Artificial intelligence business computer Natural language processing media_common |
Zdroj: | International Journal of Corpus Linguistics. 3:115-150 |
ISSN: | 1569-9811 1384-6655 |
DOI: | 10.1075/ijcl.3.1.06som |
Popis: | We describe and experimentally evaluate an alternative algorithm for aligning and extracting vocabulary from parallel texts using recency vectors and a similarity measure based on Levenshtein distance. The work is largely inspired by Fung and McKeown 's DK-vec, though we use a simpler algorithm. The technique is tested on two sets of parallel corpora involving English, French, German, Dutch, Spanish, and Japanese. We attempt to evaluate the importance of parameters such as frequency of words chosen as candidates, the effect of different language pairings, and differences between the two corpora. |
Databáze: | OpenAIRE |
Externí odkaz: |