On the Bit-Parallel Approaches to String Matching Problem

Autor: Ou, Chia-Shin, 歐家欣
Rok vydání: 2014
Druh dokumentu: 學位論文 ; thesis
Popis: 102
In this thesis, we consider two problems: (1) Developing a systematical approach to solve problems involving special properties of bit-vectors (2) Developing a bit-parallel approach to solve the nearest neighbor string matching problem. For problem 1, suppose that we are given a vector consisting of only 1's and 0's and we are interested in finding some special properties of this vector. These properties all involve "for-all" or "there-exists" notations and we are interested in bit-parallel processes to find these properties. In Myers' work, a sequence of logical operations can be expressed as a logical formula: (∃k ≤ i, A(k) and ∀x∈[k, i – 1]) = (((A & B) + B) ^ B) | A) which is by no means easy to be obtained. The contribution of our research is to present a systematical method to find such logical formulas to solve problems involving bit vectors with "for-all" and "there-exists" notations. Using our way of thinking, the equation in Myers' work can be deduced step by step. Five logical prototype problems, "single-for-all", "single-there- exists", "multiple-for-all", "multiple- there-exists" and "multiple-there-exists-and-for-all", are defined in this thesis and are proved that all can be computed using bit-parallel operations in O(n/w) time, where w is the word size of the machine. We also consider four variants of the five problems and show that their logical formulas can be obtained using those of the five prototype problems systematically. The nearest neighbor string matching problem is defined as follows: Given a text string T = t1t2…tn and a pattern string P = p1p2…pm, the nearest neighbor string matching problem is to find all substrings of T whose edit distances with P are the smallest, among all substrings of T. The nearest neighbor string matching problem has useful applications in bioinformatics. It can be straight-forwardly solved by the Seller's Algorithm and the Myers Algorithm which are used to solve the approximate string matching problem. Hyyro and Navarro proposed a filtering algorithm to speed up the Myers Algorithm. However, Hyyro and Navarro's filtering approach needs to perform a pre-processing based on the error bound k which is given by the definition of approximate string matching problem. Hence, it is not suitable to be used to solve the nearest neighbor string matching problem which has no k. In this thesis, we present a modification of the Hyyro and Navarro Algorithm, and also present a bit-parallel algorithm combining the Myers Algorithm and the modified Hyyro and Navarro Algorithm. In experiments, we show that our bit-parallel algorithm is efficient.
Databáze: Networked Digital Library of Theses & Dissertations