A Data Quality Strategy to Enable FAIR, Programmatic Access across Large, Diverse Data Collections for High Performance Data Analysis
Autor: | K. A. Druken, Jingbo Wang, Rui Yang, C. J. Richards, B. J. K. Evans, Lesley Wyborn |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2017 |
Předmět: |
netCDF
Standardization Computer Networks and Communications Computer science Interoperability quality assurance 010502 geochemistry & geophysics computer.software_genre 01 natural sciences fair data benchmarks data quality quality control data management policy 0105 earth and related environmental sciences lcsh:T58.5-58.64 business.industry lcsh:Information technology Communication 05 social sciences performance high performance computing HPC Benchmarking Data structure Data science Human-Computer Interaction Data access Data quality 0509 other social sciences Web service 050904 information & library sciences business Quality assurance computer |
Zdroj: | Informatics, Vol 4, Iss 4, p 45 (2017) Informatics; Volume 4; Issue 4; Pages: 45 |
ISSN: | 2227-9709 |
Popis: | To ensure seamless, programmatic access to data for High Performance Computing (HPC) and analysis across multiple research domains, it is vital to have a methodology for standardization of both data and services. At the Australian National Computational Infrastructure (NCI) we have developed a Data Quality Strategy (DQS) that currently provides processes for: (1) Consistency of data structures needed for a High Performance Data (HPD) platform; (2) Quality Control (QC) through compliance with recognized community standards; (3) Benchmarking cases of operational performance tests; and (4) Quality Assurance (QA) of data through demonstrated functionality and performance across common platforms, tools and services. By implementing the NCI DQS, we have seen progressive improvement in the quality and usefulness of the datasets across the different subject domains, and demonstrated the ease by which modern programmatic methods can be used to access the data, either in situ or via web services, and for uses ranging from traditional analysis methods through to emerging machine learning techniques. To help increase data re-usability by broader communities, particularly in high performance environments, the DQS is also used to identify the need for any extensions to the relevant international standards for interoperability and/or programmatic access. |
Databáze: | OpenAIRE |
Externí odkaz: |