I/O-Efficient Data Structures For Non-Overlapping Indexing

Autor: Paniz Abedin, Sahar Hooshmand, Sharma V. Thankachan, M. Oğuzhan Külekci
Rok vydání: 2021
Předmět:
Popis: The non-overlapping indexing problem is defined as follows: pre-process a given text T [ 1 , n ] of length n into a data structure such that whenever a pattern P [ 1 , m ] comes as an input, we can efficiently report the largest set of non-overlapping occurrences of P in T . The best-known solution is by Cohen and Porat [ISAAC 2009]. The size of their structure is O ( n ) words and the query time is optimal O ( m + nocc ) , where nocc is the output size. Later, Ganguly et al. [CPM 2015 and Algorithmica 2020] proposed a compressed space solution. We study this problem in the cache-oblivious model and present a new data structure of size O ( n log ⁡ n ) words. It can answer queries in optimal O ( m B + log B ⁡ n + nocc B ) I/O operations, where B is the block size. The space can be improved to O ( n log M / B ⁡ n ) in the cache-aware model, where M is the size of main memory. Additionally, we study a generalization of this problem with an additional range [ s , e ] constraint. Here the task is to report the largest set of non-overlapping occurrences of P in T , that are within the range [ s , e ] . We present an O ( n log 2 ⁡ n ) space data structure in the cache-aware model that can answer queries in optimal O ( m B + log B ⁡ n + nocc [ s , e ] B ) I/O operations, where nocc [ s , e ] is the output size.
Databáze: OpenAIRE