Non-parametric change-point detection using string matching algorithms
Autor: | Oliver Johnson, Ayalvadi Ganesh, Robert J. Piechocki, Dino Sejdinovic, James Cruise |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2011 |
Předmět: |
Statistics and Probability
FOS: Computer and information sciences Fluid limit Stationary distribution Markov chain General Mathematics Information Theory (cs.IT) Computer Science - Information Theory Probability (math.PR) Estimator String searching algorithm Information theory Methodology (stat.ME) FOS: Mathematics Entropy (information theory) Algorithm Change detection Mathematics - Probability Statistics - Methodology Mathematics |
Popis: | Given the output of a data source taking values in a finite alphabet, we wish to detect change-points, that is times when the statistical properties of the source change. Motivated by ideas of match lengths in information theory, we introduce a novel non-parametric estimator which we call CRECHE (CRossings Enumeration CHange Estimator). We present simulation evidence that this estimator performs well, both for simulated sources and for real data formed by concatenating text sources. For example, we show that we can accurately detect the point at which a source changes from a Markov chain to an IID source with the same stationary distribution. Our estimator requires no assumptions about the form of the source distribution, and avoids the need to estimate its probabilities. Further, we establish consistency of the CRECHE estimator under a related toy model, by establishing a fluid limit and using martingale arguments. |
Databáze: | OpenAIRE |
Externí odkaz: |