Terminology Identification in a Collection of Web Resources.

Autor: Godby, Carol Jean1,2 godby@oclc.org, Reighart, Ray1,3 reighart@ocle.org
Zdroj: Journal of Internet Cataloging. Nov2001, Vol. 4 Issue 1/2, p49-65. 17p.
Abstrakt: The primary goal of the WordSmith project is to obtain subject terminology directly from raw text. We are currently investigating the hypothesis that reliable subject terms can be automatically collected, re-used, and organized into thesaurus-like objects that enhance access to material that is unwieldy to classify by hand, such as the Web documents in the CORC database. Baseline results of our work are already visible in the CORC project. Catalogers who check the Generate possible subject terms button in the process of creating a description for a new item may retrieve novel subject terms, such as animal genome databases, backcountry Web sites, digital communities, e-mail viruses, and worldwide Internet music. These terms are too new to appear in standard library classification schemes. In later versions of CORC, we want to make automatic keyword assignment more responsive to the needs of catalogers and use this terminology in other ways to increase subject access to the CORC collection. Our paper describes the current implementation of WordSmith in CORC, an evaluation of the results, and proposed future enhancements. [ABSTRACT FROM PUBLISHER]
Databáze: Library, Information Science & Technology Abstracts