Fostering the democratization of research data by using the Annotated Research Context (ARC) as practical implementation

Autor: Venn, Benedikt, Schneider, Kevin, Frey, Kevin, Weil, H. Lukas, Werner, Johannes, Wannenmacher, Fabian, Zajac, Thomas, von Suchodoletz, Dirk, Usadel, Björn, Krüger, Jens, Garth, Christoph, Mühlhaus, Timo
Jazyk: angličtina
Rok vydání: 2021
Předmět:
Popis: Research in modern life science increasingly depends on the exchange of interdisciplinary expertise and collaboration and the reuse and integration of large data sets. The advancing digitization in particular, opens up new possibilities for scientific knowledge acquisition, especially for the fundamental plant research community. However, challenges exist specifically in capturing the entire research cycle, including contextualization of data according to the FAIR and linked open data principles for the DataPLANT (https://nfdi4plants.de/) community and beyond. Here, we propose a data structure dubbed Annotated Research Context (ARC - https://github.com/nfdi4plants/ARC) which captures the complete research cycle in a structured way, meeting the FAIR requirements with low friction for the individual researcher. ARCs are self-contained and include assay and measurement data, workflows, and computation results, accompanied by metadata in one package. Their structure allows full user-control over all metadata and facilitates usability, access, publication, and sharing of the research. Thereby, ARCs are a practical implementation of existing standards leveraging the advantages of the ISA model, research crates, and the Common Workflow Language. The ARC concept relies on a structure that partitions assay, workflow and results for granular reuse and development. Assays cover biological, experimental, and instrumental data including its self-contained description using the ISA model. Similarly, workflows describe all digital steps of a study and contain application code, scripts and/or any other executable description of an analysis providing the highest degree of flexibility for the scientists. Further, to ensure persistence and reproducibility, workflows include their own containerized running environment. The result data is linked to the workflows by a minimal Common Workflow Language file specifying the workflow input and output. The suggested structure for ARCs is a starting point for individual research projects and defines a framework for the organization, sharing, versioning, reuse (clone), and evolution (fork/pull request) of research projects in a manner familiar from open-source software development. ARCs will form the basis of our collaborative research platform, the DataPLANT Hub, but will also provide an interface with existing infrastructure aiming at compatibility with public services and existing repositories due to its decentralized conception. Additionally, it will be possible for the DataPLANT community to handle ARCs on the de.NBI Cloud and the Storage-for-Science RDM system and to compute on the bwForCluster BinAC, the de.NBI Cloud, and on Galaxy resources. In the future, we envision ARC publications as a central component of knowledge/data communication and sharing, which can be referenced by classical journal publication. As part of the ARC vision, we will discuss mechanisms for measuring data and metadata quality.
Databáze: OpenAIRE