Corpus-dependent association thesauri for information retrieval
Autor: | Hiroyuki Kaji, Yasutsugu Morimoto, Toshiko Aizono, Noriyuki Yamasaki |
---|---|
Rok vydání: | 2000 |
Předmět: |
Text corpus
Structure (mathematical logic) Thesaurus (information retrieval) Information retrieval business.industry Computer science Association (object-oriented programming) InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL computer.software_genre ComputingMethodologies_ARTIFICIALINTELLIGENCE Term (time) ComputingMethodologies_PATTERNRECOGNITION Component (UML) Noun ComputingMethodologies_DOCUMENTANDTEXTPROCESSING Artificial intelligence Cluster analysis business computer Natural language processing |
Zdroj: | COLING |
DOI: | 10.3115/990820.990879 |
Popis: | This paper presents a method for automatically generating an association thesaurus from a text corpus, and demonstrates its application to information retrieval. The thesaurus generation method consists of extracting terms and co-occurrence data from a corpus and analyzing the correlation between terms statistically. A new method for disambiguating the structure of compound nouns, which is a key component for term extraction, is also proposed. The automatically generated thesaurus is effectively used as a tool for exploring information. A thesaurus navigator having novel functions such as term clustering, thesaurus overview, and zooming-in is proposed. |
Databáze: | OpenAIRE |
Externí odkaz: |