Effective Translation, Tokenization and Combination for Cross-Lingual Retrieval
Autor: | Sisay Fissaha Adafre, Jaap Kamps, Maarten de Rijke |
---|---|
Rok vydání: | 2005 |
Předmět: |
Information retrieval
Parsing Machine translation Computer science business.industry InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL Information access computer.software_genre Tokenization (data security) Human–computer information retrieval ComputingMethodologies_DOCUMENTANDTEXTPROCESSING Vector space model Multilingualism Artificial intelligence Document retrieval business computer Natural language processing |
Zdroj: | Multilingual Information Access for Text, Speech and Images ISBN: 9783540274209 CLEF |
DOI: | 10.1007/11519645_12 |
Popis: | Our approach to cross-lingual document retrieval starts from the assumption that effective monolingual retrieval is at the core of any cross-language retrieval system. We devote particular attention to three crucial ingredients of our approach to cross-lingual retrieval. First, effective tokenization techniques are essential to cope with morphological variations common in many European languages. Second, effective combination methods allow us to combine the best of different strategies. Finally, effective translation methods for translating queries or documents turn a monolingual retrieval system into a cross-lingual retrieval system proper. The viability of our approach is shown by a series of experiments in monolingual, bilingual, and multilingual retrieval. |
Databáze: | OpenAIRE |
Externí odkaz: |