Document retrieval with one wildcard
Autor: | Sharma V. Thankachan, J. Ian Munro, Moshe Lewenstein, Yakov Nekrich |
---|---|
Rok vydání: | 2016 |
Předmět: |
Information retrieval
General Computer Science Query string Computer science Wildcard character 0102 computer and information sciences 02 engineering and technology computer.file_format Extension (predicate logic) Data structure 01 natural sciences Substring Theoretical Computer Science Combinatorics 010201 computation theory & mathematics Symbol (programming) Wildcard 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Alphabet Document retrieval computer |
Zdroj: | Theoretical Computer Science. 635:94-101 |
ISSN: | 0304-3975 |
DOI: | 10.1016/j.tcs.2016.05.024 |
Popis: | In this paper we extend several well-known document listing problems to the case when documents contain a substring that approximately matches the query pattern. We study the scenario when the query string can contain a wildcard symbol that matches any alphabet symbol; all documents that match a query pattern with one wildcard must be enumerated. We describe a linear space data structure that reports all documents containing a substring P in O ( | P | + ź log ź log ź log ź n + docc ) time, where ź is the alphabet size and docc is the number of listed documents. We also describe a succinct solution for this problem, as well as a solution for an extension of this problem. Furthermore our approach enables us to obtain an O ( n ź ) -space data structure that enumerates all documents containing both a pattern P 1 and a pattern P 2 in the special case when P 1 and P 2 differ in one symbol. |
Databáze: | OpenAIRE |
Externí odkaz: |