ProvAnalyser: A Framework for Scientific Workflows Provenance

Autor: Peter Fitch, Anila Sahar Butt
Rok vydání: 2021
Předmět:
Zdroj: Communications in Computer and Information Science ISBN: 9783030674441
MODELSWARD (Revised Selected Papers)
DOI: 10.1007/978-3-030-67445-8_5
Popis: The increasing ability of data-driven science is resulting in a growing need for applications that are under the control of data-centric workflows, also known as scientific workflows. The focus of this work is on provenance collection for these workflows, necessary to validate the workflow and to determine the quality of generated data products. However, the act of instrumenting a workflow engine for provenance collection is burdensome. This complex task requires adding hooks to the workflow engine to capture provenance, which can cause perturbation in execution. We address the challenge of extracting provenance data in the form of a knowledge graph from the event logs of the workflows to record critical information about the applications and the workflows. We present an ontology-based framework for provenance collection using the event logs of workflow engine. Further, we reduce provenance use cases to SPARQL queries over captured provenance knowledge graph. Performance evaluation demonstrates that the framework is capable of reconstructing complete data and invocation dependency graphs from one or various execution traces.
Databáze: OpenAIRE