The automatic generation of thesauri of related words for English, French, German, and Russian

Autor:	Reinhard Rapp
Rok vydání:	2008
Předmět:	Text corpus Linguistics and Language Generalization business.industry Computer science First language InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL Thesaurus computer.software_genre Language and Linguistics language.human_language Human-Computer Interaction German Word lists by frequency Corpus linguistics language Computer Vision and Pattern Recognition Artificial intelligence business computer Software Word (computer architecture) Natural language processing
Zdroj:	International Journal of Speech Technology. 11:147-156
ISSN:	1572-8110 1381-2416
DOI:	10.1007/s10772-009-9043-7
Popis:	A method for the automatic extraction of words with similar meanings is presented which is based on the analysis of word distribution in large monolingual text corpora. It involves compiling matrices of word co-occurrences and reducing the dimensionality of the semantic space by conducting a singular value decomposition. This way problems of data sparseness are reduced and a generalization effect is achieved which considerably improves the results. The method is largely language independent and has been applied to corpora of English, French, German, and Russian, with the resulting thesauri being freely available. For the English thesaurus, an evaluation has been conducted by comparing it to experimental results as obtained from test persons who were asked to give judgements of word similarities. According to this evaluation, the machine generated results come close to native speaker’s performance.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::25e71770367dd45c429e8cb7f6e2e3ab https://doi.org/10.1007/s10772-009-9043-7 Zobrazit plný text záznamu Full text from SpringerLink