SDM: A Scientific Dataset Delivery Platform
Autor: | John H. Hartman, Larry L. Peterson, Illyoung Choi, Jude Nelson |
---|---|
Rok vydání: | 2019 |
Předmět: |
0301 basic medicine
Database Computer science business.industry Interface (computing) 02 engineering and technology computer.software_genre 03 medical and health sciences Task (computing) 030104 developmental biology Data access Workflow Wide area network 020204 information systems 0202 electrical engineering electronic engineering information engineering The Internet business Cloud storage computer Data transmission |
Zdroj: | eScience |
DOI: | 10.1109/escience.2019.00049 |
Popis: | Scientific computing is becoming more data-centric and more collaborative, requiring increasingly large datasets to be transferred across the Internet. Transferring these datasets efficiently and making them accessible to scientific workflows is an increasingly difficult task. In addition, the data transfer time can be a significant portion of the overall workflow running time. This paper presents SDM (Syndicate Dataset Manager), a scientific dataset delivery platform. Unlike general-purpose data transfer tools, SDM offers on-demand access to remote scientific datasets. On-demand access doesn't require staging datasets to local file systems prior to computing on them, and it also enables overlapping computation and I/O. In addition, SDM offers a simple interface for users to locate and access datasets. To validate the usefulness of SDM, we performed realistic metagenomic sequence analysis workflows on remote genomic datasets. In general, SDM configured with a CDN outperforms existing data access methods. With warm CDN caches, SDM completes the workflow 17-20% faster than staging methods. Its performance is even comparable to local storage. SDM is only 9% slower than local HDD storage and 18% slower than local SSD storage. Together, its performance and its ease-of-use make SDM an attractive platform for performing scientific workflows on remote datasets. |
Databáze: | OpenAIRE |
Externí odkaz: |