Gathering data from different Research infrastructures: lessons learnt from the soil water content use case of the ENVRI-Fair project

Autor: André Chanzy, Giovanni L'Abate, Lucia Vaira, Xeni Kechagioglou, Christian Pichot, Alberto Basset, Dario Papale
Rok vydání: 2023
DOI: 10.5194/egusphere-egu23-14305
Popis: Soil water content (SWC) is a key variable in many ecosystem processes as the vegetation dynamic, the biogeochemical cycles, water balance, soil physical properties. However, SWC presents very strong spatial variations linked to the heterogeneity of the soil, the vegetation,climatic conditions and relief. This variability is also temporal, linked to the temporal dynamics of plant cover and climate. In situ measurement methods are very local, requiring large, expensive and intrusive sampling. Modelling therefore remains an essential tool for providing a representation of soil moisture at scales of interest for many applications (hydrology, ecology, agronomy). Models are nevertheless relatively complex and require a large number of contextual variables, such as the local climate, plant cover and its rooting, soil properties and topography. It is therefore important to be able to surround the soil moisture measurements by the acquisition of contextual variables that allow the interpretation of the measurements and feed the models. For this purpose, research infrastructures offer favourable framework that is complementary to existing networks by producing both quality soil moisture measurements and by describing the context variables. A use case dedicated to SWC was conducted in the frame of the ENVRI-FAIR project involving the main environmental Research Infrastructures having sites measuring SWC: AnaEE, eLTER, LifeWatch, ICOS, DANUBIUS and SIOS. A first step of the use case was to identify users’ needs, by defining criteria to identify relevant datasets and useful metadata to document the datasets. A survey was done with about 100 answers. The main foreseen uses are environmental model calibration, data assimilation, remote sensing product calibration, global change studies and environmental monitoring. To use SWC data, the main information expected by the users are soil characteristics, the geolocation and the ecosystem type. Concerning the availability of contextual variables, the climate, the soil physical characteristics and ecosystem managements were mentioned as the most important. From that survey, a semantic model was proposed to determine and name the main metadata that are used by the querying tools and the dataset description. The EML standard was used for metadata discovery and the LifeWatch ERIC Metatada Catalogue was used to collect a dataset from each involved Research Infrastructure and ensure the interoperability. Even if the metadata fields provided by the EML standard sufficiently describe the SWC datasets, they have shown limitation for advanced queries on the existence of contextual variables or on the datasets exploitation metadata and thus a dedicated portal was developed for advanced searches. DCAT model and its extension developed in the frame of ENVRI-FAIR was preferred whenever possible and all metadata were gathered in a RDF triple store. The metadata collection flows and the alignment of the vocabularies were important issues. The availability of the metadata through machine to machine process was analysed and Recommendations to data providers to annotate their datasets were also given. These concerns the dataset description itself but also the site description that holds part of the useful information. An evaluation of the vocabulary alignment effort was assessed considering both automatic and the remaining manual alignments.
Databáze: OpenAIRE