Zobrazeno 1 - 10
of 107
pro vyhledávání: '"Dębowski, Łukasz"'
Autor:
Dębowski, Łukasz
Motivated by problems of statistical language modeling, we consider probability measures on infinite sequences over two countable alphabets of a different cardinality, such as letters and words. We introduce an invertible mapping between such measure
Externí odkaz:
http://arxiv.org/abs/2409.13600
Autor:
Dębowski, Łukasz
In the article "Universal Densities Exist for Every Finite Reference Measure" (IEEE Trans. Inform. Theory, vol. 69, no. 8, pp. 5277--5288, 2023) we neglected to mention relevant contributions of Boris Ryabko. We cited a source by him that contains a
Externí odkaz:
http://arxiv.org/abs/2311.08951
Autor:
Dębowski, Łukasz
The article introduces corrections to Zipf's and Heaps' laws based on systematic models of the proportion of hapaxes, i.e., words that occur once. The derivation rests on two assumptions: The first one is the standard urn model which predicts that ma
Externí odkaz:
http://arxiv.org/abs/2307.12896
Autor:
Dębowski, Łukasz
By an analogy to the duality between the recurrence time and the longest match length, we introduce a quantity dual to the maximal repetition length, which we call the repetition time. Extending prior results, we sandwich the repetition time in terms
Externí odkaz:
http://arxiv.org/abs/2306.14703
Autor:
Dębowski, Łukasz
It was observed that large language models exhibit a power-law decay of cross entropy with respect to the number of parameters and training tokens. When extrapolated literally, this decay implies that the entropy rate of natural language is zero. To
Externí odkaz:
http://arxiv.org/abs/2302.09049
Autor:
Dębowski, Łukasz
We present an impossibility result, called a theorem about facts and words, which pertains to a general communication system. The theorem states that the number of distinct words used in a finite text is roughly greater than the number of independent
Externí odkaz:
http://arxiv.org/abs/2211.01031
Autor:
Dębowski, Łukasz
We revisit the problem of minimal local grammar-based coding. In this setting, the local grammar encoder encodes grammars symbol by symbol, whereas the minimal grammar transform minimizes the grammar length in a preset class of grammars as given by t
Externí odkaz:
http://arxiv.org/abs/2209.13636
Autor:
Dębowski, Łukasz
Publikováno v:
IEEE Transactions on Information Theory, vol. 69(8), pp. 5277-5288, 2023
As it is known, universal codes, which estimate the entropy rate consistently, exist for stationary ergodic sources over finite alphabets but not over countably infinite ones. We generalize universal coding as the problem of universal densities with
Externí odkaz:
http://arxiv.org/abs/2209.11981
Autor:
DĘBOWSKI, ŁUKASZ, STEIFER, TOMASZ
Publikováno v:
The Bulletin of Symbolic Logic, 2022 Sep 01. 28(3), 387-412.
Externí odkaz:
https://www.jstor.org/stable/27166953
Autor:
Dębowski, Łukasz
Inspired by Hilberg's hypothesis, which states that mutual information between blocks for natural language grows like a power law, we seek for links between power-law growth rate of algorithmic mutual information and of some estimator of the unifilar
Externí odkaz:
http://arxiv.org/abs/2011.12845