Server‐side workflow execution using data grid technology for reproducible analyses of data‐intensive hydrologic systems
Autor: | Moore, Reagan W., Goodall, Jonathan L., Xu, Hao, Billah, Mirza M., Essawy, Bakinam T., Kugler, Tracy A., Whitton, Mary C., Rajasekar, Arcot, Myers, James D. |
---|---|
Rok vydání: | 2016 |
Předmět: |
Computational model
Data processing 010504 meteorology & atmospheric sciences Data grid Computer science business.industry Data management 0208 environmental biotechnology Interoperability 02 engineering and technology Environmental Science (miscellaneous) 01 natural sciences 020801 environmental engineering Cyberinfrastructure Workflow General Earth and Planetary Sciences Data pre-processing business Software engineering 0105 earth and related environmental sciences |
Zdroj: | Earth and Space Science. 3:163-175 |
ISSN: | 2333-5084 |
DOI: | 10.1002/2015ea000139 |
Popis: | Many geoscience disciplines utilize complex computational models for advancing understanding and sustainable management of Earth systems. Executing such models and their associated data preprocessing and postprocessing routines can be challenging for a number of reasons including (1) accessing and preprocessing the large volume and variety of data required by the model, (2) postprocessing large data collections generated by the model, and (3) orchestrating data processing tools, each with unique software dependencies, into workflows that can be easily reproduced and reused. To address these challenges, the work reported in this paper leverages the Workflow Structured Object functionality of the Integrated Rule‐Oriented Data System and demonstrates how it can be used to access distributed data, encapsulate hydrologic data processing as workflows, and federate with other community‐driven cyberinfrastructure systems. The approach is demonstrated for a study investigating the impact of drought on populations in the Carolinas region of the United States. The analysis leverages computational modeling along with data from the Terra Populus project and data management and publication services provided by the Sustainable Environment‐Actionable Data project. The work is part of a larger effort under the DataNet Federation Consortium project that aims to demonstrate data and computational interoperability across cyberinfrastructure developed independently by scientific communities. Executing computational workflows in the geosciences can be challenging, especially when dealing with large, distributed, and heterogeneous data sets and computational tools. We present a methodology for addressing this challenge using the Integrated Rule‐Oriented Data System (iRODS) Workflow Structured Object (WSO). We demonstrate the approach through an end‐to‐end application of data access, processing, and publication of digital assets for a scientific study analyzing drought in the Carolinas region of the United States. |
Databáze: | OpenAIRE |
Externí odkaz: |