Testing the Relationship between Word Length, Frequency, and Predictability Based on the German Reference Corpus

Autor:	Alexander Koplenig, Marc Kupietz, Sascha Wolfer
Rok vydání:	2021
Předmět:	Reading Artificial Intelligence Cognitive Neuroscience Humans Experimental and Cognitive Psychology Linguistics Language
Zdroj:	Cognitive scienceReferences. 46(6)
ISSN:	1551-6709
Popis:	In a recent article, Meylan and Griffiths (MeylanGriffiths, 2021, henceforth, MG) focus their attention on the significant methodological challenges that can arise when using large-scale linguistic corpora. To this end, MG revisit a well-known result of Piantadosi, Tily, and Gibson (2011, henceforth, PTG) who argue that average information content is a better predictor of word length than word frequency. We applaud MG who conducted a very important study that should be read by any researcher interested in working with large-scale corpora. The fact that MG mostly failed to find clear evidence in favor of PTG's main finding motivated us to test PTG's idea on a subset of the largest archive of German language texts designed for linguistic research, the German Reference Corpus consisting of ∼43 billion words. We only find very little support for the primary data point reported by PTG.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::7f807b51d5198c0cd7a2a67f07689256 https://pubmed.ncbi.nlm.nih.gov/35661231 Zobrazit plný text záznamu Plný text