The distribution of English dictionary word lengths
Autor: | Lord Rothschild |
---|---|
Rok vydání: | 1986 |
Předmět: |
Statistics and Probability
Discrete mathematics Exponential distribution Distribution (number theory) Applied Mathematics Gaussian Skew Poisson distribution Exponential function symbols.namesake Log-normal distribution symbols Statistics Probability and Uncertainty Arithmetic Word (computer architecture) Mathematics |
Zdroj: | Journal of Statistical Planning and Inference. 14:311-322 |
ISSN: | 0378-3758 |
DOI: | 10.1016/0378-3758(86)90169-2 |
Popis: | The distribution of English dictionary word lengths has been used by Bagnold (1983) to promote the proposition that distributions arising from natural processes are not Gaussian but skew distributions with exponential tails. This proposition is examined. The distribution of written English word lengths, obtained from a sample of about 2500 words from a dictionary, is reasonably well fitted by a shifted Poisson distribution with mean and variance equal to 6.94 and 5.80. The departure from the Poisson model is mainly due to the number of two, three and four letter words. The distribution of written word lengths in various languages constitutes an important part of Bagnold's thesis which also ascribes ubiquity to the negative exponential or some form of double negative exponential distribution. The results described in this paper require the rejection of this thesis. The mean word length of the spoken word is shorter than that of written word. Furthermore, the distribution of the former is lognormal whereas the latter is not. Some unpublished Arabic and Farsi word length distributions, both bimodal, are included. |
Databáze: | OpenAIRE |
Externí odkaz: |