Fast Label Extraction in the CDAWG
Autor: | Fabio Cunial, Djamal Belazzougui |
---|---|
Rok vydání: | 2017 |
Předmět: |
Vertex (graph theory)
Reduction (recursion theory) Suffix tree String (computer science) 0102 computer and information sciences 02 engineering and technology 01 natural sciences law.invention Combinatorics 010201 computation theory & mathematics law 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Straight-line program MathematicsofComputing_DISCRETEMATHEMATICS Mathematics Directed acyclic word graph |
Zdroj: | String Processing and Information Retrieval ISBN: 9783319674278 SPIRE |
DOI: | 10.1007/978-3-319-67428-5_14 |
Popis: | The compact directed acyclic word graph (CDAWG) of a string T of length n takes space proportional just to the number e of right extensions of the maximal repeats of T, and it is thus an appealing index for highly repetitive datasets, like collections of genomes from similar species, in which e grows significantly more slowly than n. We reduce from \(O(m\log {\log {n}})\) to O(m) the time needed to count the number of occurrences of a pattern of length m, using an existing data structure that takes an amount of space proportional to the size of the CDAWG. This implies a reduction from \(O(m\log {\log {n}}+\mathtt {occ})\) to \(O(m+\mathtt {occ})\) in the time needed to locate all the \(\mathtt {occ}\) occurrences of the pattern. We also reduce from \(O(k\log {\log {n}})\) to O(k) the time needed to read the k characters of the label of an edge of the suffix tree of T, and we reduce from \(O(m\log {\log {n}})\) to O(m) the time needed to compute the matching statistics between a query of length m and T, using an existing representation of the suffix tree based on the CDAWG. All such improvements derive from extracting the label of a vertex or of an arc of the CDAWG using a straight-line program induced by the reversed CDAWG. |
Databáze: | OpenAIRE |
Externí odkaz: |