Corpus-dependent association thesauri for information retrieval

Autor:	Hiroyuki Kaji, Yasutsugu Morimoto, Toshiko Aizono, Noriyuki Yamasaki
Rok vydání:	2000
Předmět:	Text corpus Structure (mathematical logic) Thesaurus (information retrieval) Information retrieval business.industry Computer science Association (object-oriented programming) InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL computer.software_genre ComputingMethodologies_ARTIFICIALINTELLIGENCE Term (time) ComputingMethodologies_PATTERNRECOGNITION Component (UML) Noun ComputingMethodologies_DOCUMENTANDTEXTPROCESSING Artificial intelligence Cluster analysis business computer Natural language processing
Zdroj:	COLING
DOI:	10.3115/990820.990879
Popis:	This paper presents a method for automatically generating an association thesaurus from a text corpus, and demonstrates its application to information retrieval. The thesaurus generation method consists of extracting terms and co-occurrence data from a corpus and analyzing the correlation between terms statistically. A new method for disambiguating the structure of compound nouns, which is a key component for term extraction, is also proposed. The automatically generated thesaurus is effectively used as a tool for exploring information. A thesaurus navigator having novel functions such as term clustering, thesaurus overview, and zooming-in is proposed.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::796d50b1fa2169c9c32e6dcecaf99d72 https://doi.org/10.3115/990820.990879 Zobrazit plný text záznamu