Popis: |
Pattern matching algorithms to find exact occurrences of a pattern $S\in\Sigma^m$ in a text $T\in\Sigma^n$ have been analyzed extensively with respect to asymptotic best, worst, and average case runtime. For more detailed analyses, the number of text character accesses $X^{\mathcal{A},S}_n$ performed by an algorithm $\mathcal{A}$ when searching a random text of length $n$ for a fixed pattern $S$ has been considered. Constructing a state space and corresponding transition rules (e.g. in a Markov chain) that reflect the behavior of a pattern matching algorithm is a key step in existing analyses of $X^{\mathcal{A},S}_n$ in both the asymptotic ($n\to\infty$) and the non-asymptotic regime. The size of this state space is hence a crucial parameter for such analyses. In this paper, we introduce a general methodology to construct corresponding state spaces and demonstrate that it applies to a wide range of algorithms, including Boyer-Moore (BM), Boyer-Moore-Horspool (BMH), Backward Oracle Matching (BOM), and Backward (Non-Deterministic) DAWG Matching (B(N)DM). In all cases except BOM, our method leads to state spaces of size $O(m^3)$ for pattern length $m$, a result that has previously only been obtained for BMH. In all other cases, only state spaces with size exponential in $m$ had been reported. Our results immediately imply an algorithm to compute the distribution of $X^{\mathcal{A},S}_n$ for fixed $S$, fixed $n$, and $\mathcal{A}\in\{\text{BM},\text{BMH},\text{B(N)DM}\}$ in polynomial time for a very general class of random text models. |