Data driven weak universal consistency
Autor: | Santhanam, N., Anantharam, V., Szpankowski, W. |
---|---|
Rok vydání: | 2014 |
Předmět: | |
Zdroj: | Journal of Machine Learning Research 23 (2022) 1-55 |
Druh dokumentu: | Working Paper |
Popis: | Many current applications in data science need rich model classes to adequately represent the statistics that may be driving the observations. But rich model classes may be too complex to admit estimators that converge to the truth with convergence rates that can be uniformly bounded over the entire collection of probability distributions comprising the model class, i.e. it may be impossible to guarantee uniform consistency of such estimators as the sample size increases. In such cases, it is conventional to settle for estimators with guarantees on convergence rate where the performance can be bounded in a model-dependent way, i.e. pointwise consistent estimators. But this viewpoint has the serious drawback that estimator performance is a function of the unknown model within the model class that is being estimated, and is therefore unknown. Even if an estimator is consistent, how well it is doing at any given time may not be clear, no matter what the sample size of the observations. Departing from the classical uniform/pointwise consistency dichotomy that leads to this impasse, a new analysis framework is explored by studying rich model classes that may only admit pointwise consistency guarantees, yet all the information about the unknown model driving the observations that is needed to gauge estimator accuracy can be inferred from the sample at hand. We expect that this data-derived estimation framework will be broadly applicable to a wide range of estimation problems by providing a methodology to deal with much richer model classes. In this paper we analyze the lossless compression problem in detail in this novel data-derived framework. Comment: Published in JMLR 2022 |
Databáze: | arXiv |
Externí odkaz: |