HDMF: Hierarchical Data Modeling Framework for Modern Science Data Standards
Autor: | Kristofer E. Bouchard, Oliver Rubel, Ryan Ly, Loren M. Frank, Benjamin Dichter, Edward F. Chang, Andrew Tritt, Donghe Kang |
---|---|
Přispěvatelé: | Baru, Chaitanya, Huan, Jun, Khan, Latifur, Hu, Xiaohua, Ak, Ronay, Tian, Yuanyuan, Barga, Roger S, Zaniolo, Carlo, Lee, Kisung, Ye, Yanfang Fanny |
Rok vydání: | 2019 |
Předmět: |
0301 basic medicine
Standardization business.industry Computer science HDF5 computer.file_format Hierarchical Data Format Article Hierarchical database model Data modeling Metadata Data Standard 03 medical and health sciences 030104 developmental biology 0302 clinical medicine Computer data storage data formats Use case neurophysiology data standards Software engineering business computer 030217 neurology & neurosurgery data modeling |
Zdroj: | IEEE BigData Proc IEEE Int Conf Big Data |
DOI: | 10.1109/bigdata47090.2019.9005648 |
Popis: | A ubiquitous problem in aggregating data across different experimental and observational data sources is a lack of software infrastructure that enables flexible and extensible standardization of data and metadata. To address this challenge, we developed HDMF, a hierarchical data modeling framework for modern science data standards. With HDMF, we separate the process of data standardization into three main components: (1) data modeling and specification, (2) data I/O and storage, and (3) data interaction and data APIs. To enable standards to support the complex requirements and varying use cases throughout the data life cycle, HDMF provides object mapping infrastructure to insulate and integrate these various components. This approach supports the flexible development of data standards and extensions, optimized storage backends, and data APIs, while allowing the other components of the data standards ecosystem to remain stable. To meet the demands of modern, large-scale science data, HDMF provides advanced data I/O functionality for iterative data write, lazy data load, and parallel I/O. It also supports optimization of data storage via support for chunking, compression, linking, and modular data storage. We demonstrate the application of HDMF in practice to design NWB 2.0 [13], a modern data standard for collaborative science across the neurophysiology community. |
Databáze: | OpenAIRE |
Externí odkaz: |