The distribution of English dictionary word lengths

Autor: Lord Rothschild
Rok vydání: 1986
Předmět:
Zdroj: Journal of Statistical Planning and Inference. 14:311-322
ISSN: 0378-3758
DOI: 10.1016/0378-3758(86)90169-2
Popis: The distribution of English dictionary word lengths has been used by Bagnold (1983) to promote the proposition that distributions arising from natural processes are not Gaussian but skew distributions with exponential tails. This proposition is examined. The distribution of written English word lengths, obtained from a sample of about 2500 words from a dictionary, is reasonably well fitted by a shifted Poisson distribution with mean and variance equal to 6.94 and 5.80. The departure from the Poisson model is mainly due to the number of two, three and four letter words. The distribution of written word lengths in various languages constitutes an important part of Bagnold's thesis which also ascribes ubiquity to the negative exponential or some form of double negative exponential distribution. The results described in this paper require the rejection of this thesis. The mean word length of the spoken word is shorter than that of written word. Furthermore, the distribution of the former is lognormal whereas the latter is not. Some unpublished Arabic and Farsi word length distributions, both bimodal, are included.
Databáze: OpenAIRE