New algorithms for text fingerprinting
Autor: | Roman Kolpakov, Mathieu Raffinot |
---|---|
Rok vydání: | 2008 |
Předmět: |
Sequence
Data_MISCELLANEOUS Fingerprint (computing) Character encoding Text algorithm Naming Substring Hash table Theoretical Computer Science Combinatorics Set (abstract data type) Binary array Computational Theory and Mathematics Text fingerprinting Character set Table (database) Discrete Mathematics and Combinatorics Maximal location Arithmetic Mathematics |
Zdroj: | Journal of Discrete Algorithms. 6(2):243-255 |
ISSN: | 1570-8667 |
DOI: | 10.1016/j.jda.2007.05.001 |
Popis: | Let s=s1..sn be a text (or sequence) on a finite alphabet Σ. A fingerprint in s is the set of distinct characters contained in one of its substrings. Fingerprinting a text consists of computing the set F of all fingerprints of all its substrings and being able to efficiently answer several questions on this set. A given fingerprint f∈F is represented by a binary array, F, of size |Σ| named a fingerprint table. A fingerprint, f∈F, admits a number of maximal locations 〈i,j〉 in S, that is the alphabet of si..sj is f and si−1,sj+1, if defined, are not in f. The set of maximal locations is L,|L|⩽n|Σ|. We present new algorithms and a new data structure for the three problems: (1) compute F; (2) given F, answer if F represents a fingerprint in F; (3) given F, find all maximal locations of F in s. These problems are, respectively, solved in O(n+|L|log|Σ|), Θ(|Σ|), and Θ(|Σ|+K) time—where K is the number of maximal locations of F. |
Databáze: | OpenAIRE |
Externí odkaz: |