Autor: |
Łukasz Dębowski |
Jazyk: |
angličtina |
Rok vydání: |
2021 |
Předmět: |
|
Zdroj: |
Entropy, Vol 23, Iss 9, p 1148 (2021) |
Druh dokumentu: |
article |
ISSN: |
1099-4300 |
DOI: |
10.3390/e23091148 |
Popis: |
We present a hypothetical argument against finite-state processes in statistical language modeling that is based on semantics rather than syntax. In this theoretical model, we suppose that the semantic properties of texts in a natural language could be approximately captured by a recently introduced concept of a perigraphic process. Perigraphic processes are a class of stochastic processes that satisfy a Zipf-law accumulation of a subset of factual knowledge, which is time-independent, compressed, and effectively inferrable from the process. We show that the classes of finite-state processes and of perigraphic processes are disjoint, and we present a new simple example of perigraphic processes over a finite alphabet called Oracle processes. The disjointness result makes use of the Hilberg condition, i.e., the almost sure power-law growth of algorithmic mutual information. Using a strongly consistent estimator of the number of hidden states, we show that finite-state processes do not satisfy the Hilberg condition whereas Oracle processes satisfy the Hilberg condition via the data-processing inequality. We discuss the relevance of these mathematical results for theoretical and computational linguistics. |
Databáze: |
Directory of Open Access Journals |
Externí odkaz: |
|
Nepřihlášeným uživatelům se plný text nezobrazuje |
K zobrazení výsledku je třeba se přihlásit.
|