Automatic processing of multilingual medical terminology: applications to thesaurus enrichment and cross-language information retrieval
Autor: | Jean-Michel Renders, Eric Gaussier, Hervé Déjean, Fatiha Sadat |
---|---|
Rok vydání: | 2005 |
Předmět: |
Medical terminology
Computer science InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL Information Storage and Retrieval Medicine (miscellaneous) Multilingualism computer.software_genre ComputingMethodologies_ARTIFICIALINTELLIGENCE Terminology Domain (software engineering) Artificial Intelligence Corpus linguistics Terminology as Topic Humans Cross-language information retrieval Language Natural Language Processing Electronic Data Processing Thesaurus (information retrieval) Information retrieval business.industry Unified Medical Language System Information extraction ComputingMethodologies_PATTERNRECOGNITION Vocabulary Controlled ComputingMethodologies_DOCUMENTANDTEXTPROCESSING Artificial intelligence business computer Medical Informatics Natural language processing |
Zdroj: | Artificial Intelligence in Medicine. 33:111-124 |
ISSN: | 0933-3657 |
Popis: | Objectives:: We present in this article experiments on multi-language information extraction and access in the medical domain. For such applications, multilingual terminology plays a crucial role when working on specialized languages and specific domains. Material and methods:: We propose firstly a method for enriching multilingual thesauri which extracts new terms from parallel corpora, and secondly, a new approach for bilingual lexicon extraction from comparable corpora, which uses a bilingual thesaurus as a pivot. We illustrate their use in multi-language information retrieval (English/German) in the medical domains. Results:: Our experiments show that these automatically extracted bilingual lexicons are accurate enough (85% precision for term extraction) for semi-automatically enriching mono- or bi-lingual thesauri such as the universal medical language system, and that their use in cross-language information retrieval significantly improves the retrieval performance (from 22 to 40% average precision) and clearly outperforms existing bilingual lexicon resources (both general lexicons and specialized ones). Conclusion:: We show in this paper first that bilingual lexicon extraction from parallel corpora in the medical domain could lead to accurate, specialized lexicons, which can be used to help enrich existing thesauri and second that bilingual lexicons extracted from comparable corpora outperform general bilingual resources for cross-language information retrieval. |
Databáze: | OpenAIRE |
Externí odkaz: |