Algorithms to Compute K-Approximate Periodicities

Autor: Liu, Chenge
Jazyk: angličtina
Rok vydání: 2024
Druh dokumentu: Diplomová práce
Popis: A cover of a string w is a string u such that each letter of w occurs in some occurrences of u. In this thesis, we present the k-approximate maximal cover, where we allow for at most k mismatches at each occurrence of u in w, u is not required to cover the entire string and to compute the coverage we consider only the exact matches. Our ultimate objective is to find an approximate cover that encompasses the maximum number of exact positions covered while allowing for k mismatches. For this problem, we present an efficient algorithm that executes in O(n2kLave)-time, where n is the length of the given string and Lave is the average length of the longest common prefix between every substring and the original string with up to k mismatches allowed. To further broaden the scope of string analysis, we address two novel problems: the k-frequency cover and the k-border. The k-frequency cover problem seeks to identify the longest substring u of x, that has the maximum number of its k-approximate occurrences over all the other substrings in x. We propose an algorithm with a computational complexity of O(n2) to resolve this challenge efficiently. The longest k-border problem focuses on determining the longest prefix of string x that matches a suffix of x of the same length while allowing at most k mismatches. We propose an O(n2)-time algorithm to compute the longest k-border of every substring of x.
Thesis
Master of Science (MSc)
Regularities in strings have been studied for many years and play a crucial role in various fields, including biology, language theory, and compression theory. Periodicity has been one of the major problems studied in the field of string algorithms spanning several decades. Despite the advancements, accurately identifying various periodici- ties and quasi-periodicities within a string presents a notable challenge, particularly when the analysis must account for errors or inconsistencies. To address these com- plexities the notion of k-approximate periodicities has been introduced. This concept allows up to k mismatches between each occurrence of the periodicity in the original string, providing a more flexible approach to understanding string regularities. In this context, our work introduces innovative algorithms designed to efficiently com- pute the k-approximate maximal cover, k-frequency cover and longest k-border under the Hamming distance measure.
Databáze: Networked Digital Library of Theses & Dissertations