Precise Data Identification Services for Long Tail Research Data

Autor: Proell, Stefan, Meixner, Kristof, Rauber, Andreas
Rok vydání: 2016
DOI: 10.6084/m9.figshare.3847632
Popis: While sophisticated research infrastructures assist scientistsin managing massive volumes of data, the so-called long tailof research data frequently suffers from a lack of such ser-vices. This is mostly due to the complexity caused by the va-riety of data to be managed and a lack of easily standardise-able procedures in highly diverse research settings. Yet, aseven domains in this long tail of research data are increas-ingly data-driven, scientists need efficient means to preciselycommunicate, which version and subset of data was used in aparticular study to enable reproducibility and comparabilityof result and foster data re-use.This paper presents three implementations of systems sup-porting such data identification services for comma sepa-rated value (CSV) files, a dominant format for data ex-change in these settings. The implementations are basedon the recommendations of the Working Group on DynamicData Citation of the Research Data Alliance (RDA). Theyprovide implicit change tracking of all data modifications,while precise subsets are identified via the respective subset-ting process. These enhances reproducibility of experimentsand allows efficient sharing of specific subsets of data evenin highly dynamic data settings
Databáze: OpenAIRE