Entropy, semantic relatedness and proximity

Autor: Robert Michael Sivley, Lance W. Hahn
Rok vydání: 2011
Předmět:
Zdroj: Behavior Research Methods. 43:746-760
ISSN: 1554-3528
Popis: Although word co-occurrences within a document have been demonstrated to be semantically useful, word interactions over a local range have been largely neglected by psychologists due to practical challenges. Shannon’s (Bell Systems Technical Journal, 27, 379–423, 623–665, 1948) conceptualization of information theory suggests that these interactions should be useful for understanding communication. Computational advances make an examination of local word–word interactions possible for a large text corpus. We used Brants and Franz’s (2006) dataset to generate conditional probabilities for 62,474 word pairs and entropy calculations for 9,917 words in Nelson, McEvoy, and Schreiber’s (Behavior Research Methods, Instruments, & Computers, 36, 402–407, 2004) free association norms. Semantic associativity correlated moderately with the probabilities and was stronger when the two words were not adjacent. The number of semantic associates for a word and the entropy of a word were also correlated. Finally, language entropy decreases from 11 bits for single words to 6 bits per word for four-word sequences. The probabilities and entropies discussed here are included in the supplemental materials for the article.
Databáze: OpenAIRE