I/O-Efficient Data Structures For Non-Overlapping Indexing
Autor: | Paniz Abedin, Sahar Hooshmand, Sharma V. Thankachan, M. Oğuzhan Külekci |
---|---|
Rok vydání: | 2021 |
Předmět: |
General Computer Science
Generalization Search engine indexing Structure (category theory) 0102 computer and information sciences 02 engineering and technology Data structure Space (mathematics) 01 natural sciences Theoretical Computer Science Combinatorics Set (abstract data type) 010201 computation theory & mathematics 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Block size Range (computer programming) Mathematics |
Popis: | The non-overlapping indexing problem is defined as follows: pre-process a given text T [ 1 , n ] of length n into a data structure such that whenever a pattern P [ 1 , m ] comes as an input, we can efficiently report the largest set of non-overlapping occurrences of P in T . The best-known solution is by Cohen and Porat [ISAAC 2009]. The size of their structure is O ( n ) words and the query time is optimal O ( m + nocc ) , where nocc is the output size. Later, Ganguly et al. [CPM 2015 and Algorithmica 2020] proposed a compressed space solution. We study this problem in the cache-oblivious model and present a new data structure of size O ( n log n ) words. It can answer queries in optimal O ( m B + log B n + nocc B ) I/O operations, where B is the block size. The space can be improved to O ( n log M / B n ) in the cache-aware model, where M is the size of main memory. Additionally, we study a generalization of this problem with an additional range [ s , e ] constraint. Here the task is to report the largest set of non-overlapping occurrences of P in T , that are within the range [ s , e ] . We present an O ( n log 2 n ) space data structure in the cache-aware model that can answer queries in optimal O ( m B + log B n + nocc [ s , e ] B ) I/O operations, where nocc [ s , e ] is the output size. |
Databáze: | OpenAIRE |
Externí odkaz: |