Using Introspection to Collect Provenance in R
Autor: | Barbara Staudt Lerner, Emery R. Boose, Luis Perez |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2018 |
Předmět: |
0301 basic medicine
Provenance scientific data provenance provenance capture provenance granularity R introspection Computer Networks and Communications Computer science media_common.quotation_subject computer.software_genre 03 medical and health sciences 0302 clinical medicine Software Programmer media_common Information retrieval Parsing lcsh:T58.5-58.64 business.industry lcsh:Information technology Communication Human-Computer Interaction 030104 developmental biology Scripting language Graph (abstract data type) Introspection business computer 030217 neurology & neurosurgery Interpreter |
Zdroj: | Informatics; Volume 5; Issue 1; Pages: 12 Informatics, Vol 5, Iss 1, p 12 (2018) |
ISSN: | 2227-9709 |
DOI: | 10.3390/informatics5010012 |
Popis: | Data provenance is the history of an item of data from the point of its creation to its present state. It can support science by improving understanding of and confidence in data. RDataTracker is an R package that collects data provenance from R scripts (https://github.com/End-to-end-provenance/RDataTracker). In addition to details on inputs, outputs, and the computing environment collected by most provenance tools, RDataTracker also records a detailed execution trace and intermediate data values. It does this using R’s powerful introspection functions and by parsing R statements prior to sending them to the interpreter so it knows what provenance to collect. The provenance is stored in a specialized graph structure called a Data Derivation Graph, which makes it possible to determine exactly how an output value is computed or how an input value is used. In this paper, we provide details about the provenance RDataTracker collects and the mechanisms used to collect it. We also speculate about how this rich source of information could be used by other tools to help an R programmer gain a deeper understanding of the software used and to support reproducibility. |
Databáze: | OpenAIRE |
Externí odkaz: |