Distributed evaluation of XPath queries over large integrated XML data

Autor: Manolis Gergatsoulis, Matthew Damigos, Eleftherios Kalogeros
Rok vydání: 2014
Předmět:
Zdroj: Panhellenic Conference on Informatics
DOI: 10.1145/2645791.2645804
Popis: XML is a widespread, text-based format used for exchanging information on the Web and representing metadata. Since the amount of XML information is rapidly increasing, efficient querying of large data repositories, containing XML data, is a significant challenge faced by system designers and data analysts who need to support operational actions and decision-making. In this paper we propose a technique for integrating large amount of XML data and use the Map-Reduce framework to efficiently query the integrated data. Each XML document obtained from the sources is transformed properly in order to fit into a predefined, virtual XML structure. Although the transformed documents are not physically integrated, the user is able to pose queries over a single XML structure. To achieve this feature we propose a single-step, Map-Reduce algorithm which takes advantage of virtual structure and computes efficiently the answer of a given XPath queries in a distributed manner.
Databáze: OpenAIRE