Early Nested Word Automata for XPath Query Answering on XML Streams

Autor:	Olivier Gauwin, Denis Debarbieux, Tom Sebastian, Mohamed Zergaoui, Joachim Niehren
Přispěvatelé:	Linking Dynamic Data (LINKS), Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS), Innovimax [Paris], ANR-14-CE25-0017,Aggreg,Requêtes d'Agrégation(2014), Niehren, Joachim, Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB), Laboratoire d'Informatique Fondamentale de Lille (LIFL), Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS), Linking Dynamic Data (LINKS ), Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS)-Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS)-Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria), innovimax
Jazyk:	angličtina
Rok vydání:	2015
Předmět:	Nested word Theoretical computer science General Computer Science computer.internet_protocol Computer science Logic [INFO.INFO-TT] Computer Science [cs]/Document and Text Processing XSLT 0102 computer and information sciences 02 engineering and technology computer.software_genre 01 natural sciences Automata Theoretical Computer Science Trees Databases [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] 020204 information systems 0202 electrical engineering electronic engineering information engineering Xslt XPath Document processing [INFO]Computer Science [cs] Tree automaton ComputingMilieux_MISCELLANEOUS Computer Science::Databases computer.programming_language XPath 2.0 Programming language Selection rule Nested words [INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG] Xml Automaton [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing XQuery TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES 010201 computation theory & mathematics TheoryofComputation_LOGICSANDMEANINGSOFPROGRAMS Streams Benchmark (computing) Streaming algorithm computer Computer Science::Formal Languages and Automata Theory XML
Zdroj:	Theoretical Computer Science Theoretical Computer Science, 2015, 578, pp.100-125. ⟨10.1016/j.tcs.2015.01.017⟩ Theoretical Computer Science, Elsevier, 2015, pp.100-125. ⟨10.1016/j.tcs.2015.01.017⟩ Implementation and Application of Automata ISBN: 9783642392733 CIAA 18th International Conference on Implementation and Application of Automata 18th International Conference on Implementation and Application of Automata, Jul 2013, Halifax, Canada. pp.292-305
ISSN:	1879-2294 0304-3975
DOI:	10.1016/j.tcs.2015.01.017⟩
Popis:	International audience; Algorithms for answering XPath queries on Xml streams have been studied intensively in the last decade. Nevertheless, there still exists no solution with high efficiency and large coverage. In this paper, we introduce early nested word automata in order to approximate earliest query answering algorithms for nested word automata. Our early query answering algorithm is based on stack-and-state sharing for running early nested word automata on all answer candidates with on-the-fly determinization. We prove tight upper complexity bounds on time and space consumption. We have implemented our algorithm in the QuiXPath system and show that it outperforms all previous tools in coverage on the XPathMark benchmark, while obtaining very high time and space efficiency and scaling to huge Xml streams. Furthermore, it turns out that our early query answering algorithm is earliest in practice on most queries from the XPathMark benchmark.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::b84d9f711b1ed644334f7851a91f0a5f https://inria.hal.science/hal-00966625/document Zobrazit plný text záznamu Full Text from ScienceDirect