Improving the Understanding of Provenance and Reproducibility of a Multi-Sensor Merged Climate Data Record

Autor: Brian Wilson, Gerald Manipon, Lei Pan, Eric J. Fetzer, Hook Hua
Rok vydání: 2012
Předmět:
Zdroj: Lecture Notes in Computer Science ISBN: 9783642342219
IPAW
Popis: Multi-decadal climate data records are critical to studying climate variability and change. These often also require merging data from multiple instruments such as those from NASA's A-Train that contain measurements covering a wide range of atmospheric conditions and phenomena. Multi-decadal climate data record of water vapor measurements from sensors on A-Train, operational weather, and other satellites are being assembled from existing data sources, or produced from well-established methods published in peer-reviewed literature. However, the immense volume and inhomogeneity of data often requires an "exploratory computing" approach to product generation where data is processed in a variety of different ways with varying algorithms, parameters, and code changes until an acceptable product is generated. Furthermore, the data product information associated with source data, processing methods, parameters used, intermediate & final product outputs, and associated materials are often hidden in each of the trials and scattered throughout the processing system(s). We will present methods to help users better capture and explore the production legacy of the data, metadata, ancillary files, code, and computing environment changes used during the production of these merged and multi-sensor data products. By building provenance services on semantic and provenance technologies, we show how to leverage provenance-as-a-service to capture sufficient information to enable users to track processing, perform faceted searches on the provenance record, and visualize the provenance of the products and processing lineage. We will also present services for capturing sufficient provenance information and the associated artifacts to enable some reproducibility of these climate data records.
Databáze: OpenAIRE