Dictionary buildup and stability of word frequency in a specialized medical area

Autor: John M. Long, Gertrude C. Levy, Howard J. Barnhard
Rok vydání: 1967
Předmět:
Zdroj: American Documentation. 18:21-25
ISSN: 1936-6108
0096-946X
DOI: 10.1002/asi.5090180105
Popis: This is a report of word usage in radiological (x-ray) patient records as found in a 5% sample of the annual case load at UAMC including 100,000 words. Records were taken exactly as dictated. The study is part of an effort to develop an IR system for patient data. The system “autocodes” (automatically stores) the physician's dictated findings and diagnoses in such a fashion that they can be retrieved again automatically. Some of our findings approximate results reported in the literature. For example, the rate of introduction of new different words levels off to about 2,500 words when 40,000 to 50,000 words of text have been analyzed. However, unclassified words continue to occur at a significant level of almost 2% at the 100,000 word level, with a 1% noise level. Attempts to establish the rank order of words beyond the first several hundred have failed because about 70% of the words appear to occur with such a low relative frequency (no more than one time in 10,000). Thus, establishing files by rank order appears impractical, even though filter lists (discard words) by rank groups (words with nearly the same relative frequency) are quite practical. Additional data are presented and design implications are discussed.
Databáze: OpenAIRE