Efficient bit-parallel multi-patterns string matching algorithms for limited expression

Autor: Ishadutta Yadav, Suneeta Agarwal, Bharat Singh, Rajesh Prasad
Rok vydání: 2010
Předmět:
Zdroj: Bangalore Compute Conf.
DOI: 10.1145/1754288.1754298
Popis: The problem of searching occurrences of a pattern P[0...m-1] in the text T[0...n-1] with m ≤ n, where the symbols of P and T are drawn from some alphabet Σ of size σ, is called exact string matching problem. The problem of searching a set of patterns P0, P1, P2...Pr-1, r ≥ 1, in the given text T is called multi-pattern string matching problem. This problem has been previously solved by bit-parallel strings matching algorithms: shift-or and Backward non-deterministic DAWG matching (BNDM). In this paper, we extend BNDM algorithm with q-gram (B. Durian et al., 2008) for multiple patterns, where patterns are taken as "limited expressions". We define limited expression as subset of extended patterns excluding regular expression, optional and repeatable characters. Some examples are: patterns in case sensitive, patterns containing classes of characters etc. The set of r multiple patterns can be handled by converting into single pattern P by using either classes of characters or concatenating the characters of each patterns. We assume that each pattern is of equal size m and total length of pattern (after pre-processing) is less than or equal to word length (w) of computer used. We compare the performance of multi-patterns q-gram BNDM algorithm with already existing BNDM algorithm.
Databáze: OpenAIRE