Can Menzerath's law be a criterion of complexity in communication?

Autor: Łukasz Dębowski, Iván González Torre, Antoni Hernández-Fernández
Přispěvatelé: Universitat Politècnica de Catalunya. Institut de Ciències de l'Educació, Universitat Politècnica de Catalunya. LARCA - Laboratori d'Algorísmia Relacional, Complexitat i Aprenentatge
Jazyk: angličtina
Rok vydání: 2021
Předmět:
Systems Analysis
Semantics (computer science)
Menzerath-Altmann's law
Social Sciences
Computational linguistics
Hidden Markov Model
Mathematical and Statistical Techniques
Linguistic laws
Menzerath's law
Mathematics
Language
Grammar
Multidisciplinary
Mathematical Models
Communication
Contrast (statistics)
Syllables
Semantics
Monkey typing
Memoryless source
Physical Sciences
Medicine
Syllable
Informàtica::Intel·ligència artificial::Llenguatge natural [Àrees temàtiques de la UPC]
Word (computer architecture)
Research Article
Science
Phonology
Research and Analysis Methods
Phonetics
Consonants
Standardized Project Gutenberg Corpus
Speech
Humans
Arithmetic
Spurious relationship
Vowels
Stochastic Processes
Models
Statistical

Null model
Linguistics
Models
Theoretical

Probability Theory
Range (mathematics)
Languages
Lingüística computacional
Zdroj: PLoS ONE, Vol 16, Iss 8, p e0256133 (2021)
PLoS ONE
UPCommons. Portal del coneixement obert de la UPC
Universitat Politècnica de Catalunya (UPC)
ISSN: 1932-6203
Popis: Menzerath’s law is a quantitative linguistic law which states that, on average, the longer is a linguistic construct, the shorter are its constituents. In contrast, Menzerath-Altmann’s law (MAL) is a precise mathematical power-law-exponential formula which expresses the expected length of the linguistic construct conditioned on the number of its constituents. In this paper, we investigate the anatomy of MAL for constructs being word tokens and constituents being syllables, measuring its length in graphemes. First, we derive the exact form of MAL for texts generated by the memoryless source with three emitted symbols, which can be interpreted as a "monkey typing" model or a null model. We show that this null model complies with Menzerath’s law, revealing that Menzerath’s law itself can hardly be a criterion of complexity in communication. This observation does not apply to the more precise Menzerath-Altmann’s law, which predicts an inverted regime for sufficiently range constructs, i.e., the longer is a word, the longer are its syllables. To support this claim, we analyze MAL on data from 21 languages, consisting of texts from the Standardized Project Gutenberg. We show the presence of the inverted regime, not exhibited by the null model, and we demonstrate robustness of our results. We also report the complicated distribution of syllable sizes with respect to their position in the word, which might be related with the emerging MAL. Altogether, our results indicate that Menzerath’s law—in terms of correlations—is a spurious observation, while complex patterns and efficiency dynamics should be rather attributed to specific forms of Menzerath-Altmann’s law. This work has been funded by the project PRO2021-S03-HERNANDEZ (Institut d’Estudis Catalans), where AHF is the principal investigator. URL: https://futur.upc.edu/30546321 AHF is also funded by the grant TIN2017-89244-R from Ministerio de Economia, Industria y Competitividad (Gobierno de España) and supported by the recognition 2017SGR-856 (MACDA) from AGAUR (Generalitat de Catalunya). URL: https://futur.upc. edu/2202438 Peer Reviewed Objectius de Desenvolupament Sostenible::9 - Indústria, Innovació i Infraestructura
Databáze: OpenAIRE
Nepřihlášeným uživatelům se plný text nezobrazuje