Early Nested Word Automata for XPath Query Answering on XML Streams

Autor: Olivier Gauwin, Denis Debarbieux, Tom Sebastian, Mohamed Zergaoui, Joachim Niehren
Přispěvatelé: Linking Dynamic Data (LINKS), Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS), Innovimax [Paris], ANR-14-CE25-0017,Aggreg,Requêtes d'Agrégation(2014), Niehren, Joachim, Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB), Laboratoire d'Informatique Fondamentale de Lille (LIFL), Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS), Linking Dynamic Data (LINKS ), Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS)-Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS)-Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria), innovimax
Jazyk: angličtina
Rok vydání: 2015
Předmět:
Nested word
Theoretical computer science
General Computer Science
computer.internet_protocol
Computer science
Logic
[INFO.INFO-TT] Computer Science [cs]/Document and Text Processing
XSLT
0102 computer and information sciences
02 engineering and technology
computer.software_genre
01 natural sciences
Automata
Theoretical Computer Science
Trees
Databases
[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]
020204 information systems
0202 electrical engineering
electronic engineering
information engineering

Xslt
XPath
Document processing
[INFO]Computer Science [cs]
Tree automaton
ComputingMilieux_MISCELLANEOUS
Computer Science::Databases
computer.programming_language
XPath 2.0
Programming language
Selection rule
Nested words
[INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG]
Xml
Automaton
[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
XQuery
TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES
010201 computation theory & mathematics
TheoryofComputation_LOGICSANDMEANINGSOFPROGRAMS
Streams
Benchmark (computing)
Streaming algorithm
computer
Computer Science::Formal Languages and Automata Theory
XML
Zdroj: Theoretical Computer Science
Theoretical Computer Science, 2015, 578, pp.100-125. ⟨10.1016/j.tcs.2015.01.017⟩
Theoretical Computer Science, Elsevier, 2015, pp.100-125. ⟨10.1016/j.tcs.2015.01.017⟩
Implementation and Application of Automata ISBN: 9783642392733
CIAA
18th International Conference on Implementation and Application of Automata
18th International Conference on Implementation and Application of Automata, Jul 2013, Halifax, Canada. pp.292-305
ISSN: 1879-2294
0304-3975
DOI: 10.1016/j.tcs.2015.01.017⟩
Popis: International audience; Algorithms for answering XPath queries on Xml streams have been studied intensively in the last decade. Nevertheless, there still exists no solution with high efficiency and large coverage. In this paper, we introduce early nested word automata in order to approximate earliest query answering algorithms for nested word automata. Our early query answering algorithm is based on stack-and-state sharing for running early nested word automata on all answer candidates with on-the-fly determinization. We prove tight upper complexity bounds on time and space consumption. We have implemented our algorithm in the QuiXPath system and show that it outperforms all previous tools in coverage on the XPathMark benchmark, while obtaining very high time and space efficiency and scaling to huge Xml streams. Furthermore, it turns out that our early query answering algorithm is earliest in practice on most queries from the XPathMark benchmark.
Databáze: OpenAIRE