FAIR data and metadata – The X-omics FAIR Data Cube and its added value for multi-omics researchers

Autor: Niehues, Anna, Liao, XiaoFeng, Brandt, Martin, Doorbos, Cenna, Ederveen, Tom, Hagenbeek, Fiona, Huang, Junda, Kulkarni, Purva, van der Velde, K. Joeri, de Visser, Casper, van Vliet, Michael, 't Hoen, Peter A. C.
Jazyk: angličtina
Rok vydání: 2022
DOI: 10.5281/zenodo.6783399
Popis: The FAIR (Findable, Accessible, Interoperable and Reusable) (FAIR) principles were proposed [1] to guide researchers to describe and share their data to increase data reuse and research reproducibility. Creating FAIR data can be challenging for multi-omics researchers due to a lack of tooling and a diverse landscape of (meta)data standards differing across -omics types. Linked data structures and graph representations allow semantic queries and open up new possibilities of data analysis. However, large multi-omics data sets cannot easily be converted to such structures. In the Netherlands X-omics Initiative, we develop a FAIR Data Cube (FDCube) [2] – a set of tools and services that help researchers in different stages of the Research Data Life Cycle including creating and describing new data, and finding, understanding and reusing existing FAIR multi-omics data. To facilitate creation of FAIR multi-omics data and metadata, we collaborate with different initiatives such as the FAIR Genomes project [3]. We adopt and develop metadata schemas for different omics data types, and make use of the Investigation-Study-Assay (ISA) metadata framework [4] to capture experimental metadata. Example workflows to create such metadata are publicly shared [5]. Researchers can find and query multi-omics studies via a FAIR Data Point (FDP) instance [6], which links to public or access-protected data repositories. A set of accompanying tools allows the import of general study metadata to the FDP as well as performing semantic queries on additional metadata on samples, phenotypes, or molecular features represented in an RDF-based knowledge graph. In order to allow analysis of access-protected data, we further implement a vantage6-based architecture that allows bioinformaticians to send containerised computing requests to access-controlled omics data storage and receive aggregated results. A prototype FDCube implementation is being developed in collaboration with the Trusted World of Corona (TWOC) [7], in which we use public COVID-19 multi-omics data sources to demonstrate the strength and added value of the FDCube and its FAIR-based methodologies. We invite researchers to discuss with us about their own experiences, how the FDcube can facilitate their research, and how X-omics tools can further support them.
{"references":["M. Wilkinson, M. Dumontier, I. Aalbersberg et al. \"The FAIR Guiding Principles for scientific data management and stewardship\". Sci Data 3:160018, 2016. https://doi.org/10.1038/sdata.2016.18","https://github.com/Xomics/FAIRDataCube","K.J. van der Velde, G. Singh, R. Kaliyaperumal. et al. \"FAIR Genomes metadata schema promoting Next Generation Sequencing data reuse in Dutch healthcare and research\". Sci Data 9:169, 2022. https://doi.org/10.1038/s41597-022-01265-x","S.A., Sansone, P. Rocca-Serra, D. Field et al. \"Toward interoperable bioscience data\". Nat Genet 44:121–126, 2012. https://doi.org/10.1038/ng.1054","https://github.com/Xomics/ISA-ACTION-Template","https://fdp.x-omics.nl, hosted by SURF","https://health-holland.com/project/2020/trusted-world-of-corona"]}
Databáze: OpenAIRE