Výsledky vyhledávání

Report

Text Indexing for Long Patterns using Locally Consistent Anchors

Autor: Ayad, Lorraine A. K., Loukides, Grigorios, Pissis, Solon P.

In many real-world database systems, a large fraction of the data is represented by strings: sequences of letters over some alphabet. This is because strings can easily encode data arising from different sources. It is often crucial to represent such

Externí odkaz: http://arxiv.org/abs/2407.11819

Zobrazit plný text záznamu

Report

Minimizing the Minimizers via Alphabet Reordering

Autor: Verbeek, Hilde, Ayad, Lorraine A. K., Loukides, Grigorios, Pissis, Solon P.

Minimizers sampling is one of the most widely-used mechanisms for sampling strings [Roberts et al., Bioinformatics 2004]. Let $S=S[1]\ldots S[n]$ be a string over a totally ordered alphabet $\Sigma$. Further let $w\geq 2$ and $k\geq 1$ be two integer

Externí odkaz: http://arxiv.org/abs/2405.04052

Zobrazit plný text záznamu

Report

Space-Efficient Indexes for Uncertain Strings

Autor: Gabory, Esteban, Liu, Chang, Loukides, Grigorios, Pissis, Solon P., Zuba, Wiktor

Strings in the real world are often encoded with some level of uncertainty. In the character-level uncertainty model, an uncertain string $X$ of length $n$ on an alphabet $\Sigma$ is a sequence of $n$ probability distributions over $\Sigma$. Given an

Externí odkaz: http://arxiv.org/abs/2403.14256

Zobrazit plný text záznamu

Report

Approximate Circular Pattern Matching under Edit Distance

Autor: Charalampopoulos, Panagiotis, Pissis, Solon P., Radoszewski, Jakub, Rytter, Wojciech, Waleń, Tomasz, Zuba, Wiktor

In the $k$-Edit Circular Pattern Matching ($k$-Edit CPM) problem, we are given a length-$n$ text $T$, a length-$m$ pattern $P$, and a positive integer threshold $k$, and we are to report all starting positions of the substrings of $T$ that are at edi

Externí odkaz: http://arxiv.org/abs/2402.14550

Zobrazit plný text záznamu

Report

Size-constrained Weighted Ancestors with Applications

Autor: Bille, Philip, Nekrich, Yakov, Pissis, Solon P.

The weighted ancestor problem on a rooted node-weighted tree $T$ is a generalization of the classic predecessor problem: construct a data structure for a set of integers that supports fast predecessor queries. Both problems are known to require $\Ome

Externí odkaz: http://arxiv.org/abs/2311.15777

Zobrazit plný text záznamu

Report

Sparse Suffix and LCP Array: Simple, Direct, Small, and Fast

Autor: Ayad, Lorraine A. K., Loukides, Grigorios, Pissis, Solon P., Verbeek, Hilde

Sparse suffix sorting is the problem of sorting $b=o(n)$ suffixes of a string of length $n$. Efficient sparse suffix sorting algorithms have existed for more than a decade. Despite the multitude of works and their justified claims for applications in

Externí odkaz: http://arxiv.org/abs/2310.09023

Zobrazit plný text záznamu

Report

Gapped String Indexing in Subquadratic Space and Sublinear Query Time

Autor: Bille, Philip, Gørtz, Inge Li, Lewenstein, Moshe, Pissis, Solon P., Rotenberg, Eva, Steiner, Teresa Anna

In Gapped String Indexing, the goal is to compactly represent a string $S$ of length $n$ such that for any query consisting of two strings $P_1$ and $P_2$, called patterns, and an integer interval $[\alpha, \beta]$, called gap range, we can quickly f

Externí odkaz: http://arxiv.org/abs/2211.16860

Zobrazit plný text záznamu

Akademický článek

Pangenome comparison via ED strings

Autor: Esteban Gabory, Moses Njagi Mwaniki, Nadia Pisanti, Solon P. Pissis, Jakub Radoszewski, Michelle Sweering, Wiktor Zuba

Publikováno v: Frontiers in Bioinformatics, Vol 4 (2024)

IntroductionAn elastic-degenerate (ED) string is a sequence of sets of strings. It can also be seen as a directed acyclic graph whose edges are labeled by strings. The notion of ED strings was introduced as a simple alternative to variation and seque

Externí odkaz: https://doaj.org/article/0fbb0868164048a0936e75501aa9841c

Zobrazit plný text záznamu

Report

Elastic-Degenerate String Matching with 1 Error

Autor: Bernardini, Giulia, Gabory, Estéban, Pissis, Solon P., Stougie, Leen, Sweering, Michelle, Zuba, Wiktor

An elastic-degenerate string is a sequence of $n$ finite sets of strings of total length $N$, introduced to represent a set of related DNA sequences, also known as a pangenome. The ED string matching (EDSM) problem consists in reporting all occurrenc

Externí odkaz: http://arxiv.org/abs/2209.01095

Zobrazit plný text záznamu

Report

Approximate Circular Pattern Matching

Autor: Charalampopoulos, Panagiotis, Kociumaka, Tomasz, Radoszewski, Jakub, Pissis, Solon P., Rytter, Wojciech, Waleń, Tomasz, Zuba, Wiktor

We consider approximate circular pattern matching (CPM, in short) under the Hamming and edit distance, in which we are given a length-$n$ text $T$, a length-$m$ pattern $P$, and a threshold $k>0$, and we are to report all starting positions of fragme

Externí odkaz: http://arxiv.org/abs/2208.08915

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání