Generating Synthetic RDF Data with Connected Blank Nodes for Benchmarking

Autor: Anastasia Analyti, Thanos Yannakis, Yannis Tzitzikas, Christina Lantzaki
Rok vydání: 2014
Předmět:
Zdroj: Lecture Notes in Computer Science ISBN: 9783319074429
ESWC
DOI: 10.1007/978-3-319-07443-6_14
Popis: Generators for synthetic RDF datasets are very important for testing and benchmarking various semantic data management tasks (e.g. querying, storage, update, compare, integrate). However, the current generators do not support sufficiently (or totally ignore) blank node connectivity issues. Blank nodes are used for various purposes (e.g. for describing complex attributes), and a significant percentage of resources is currently represented with blank nodes. Moreover, several semantic data management tasks, like isomorphism checking (useful for checking equivalence), and blank node matching (useful in comparison, versioning, synchronization, and in semantic similarity functions), not only have to deal with blank nodes, but their complexity and optimality depends on the connectivity of blank nodes. To enable the comparative evaluation of the various techniques for carrying out these tasks, in this paper we present the design and implementation of a generator, called BGen, which allows building datasets containing blank nodes with the desired complexity, controllable through various features (morphology, size, diameter, density and clustering coefficient). Finally, the paper reports experimental results concerning the efficiency of the generator, as well as results from using the generated datasets, that demonstrate the value of the generator.
Databáze: OpenAIRE