Managing dbGaP Data with Stratus, a Research Cloud for Protected Data

Autor: Benjamin J. Lynch, Graham T. Allan, Mathew Mix, Edward A. Munsell, Evan F. Bollig, Brent Swartz, Naomi Hospodarsky, Joshua Leibfried, Yectli A. Huerta
Rok vydání: 2017
Předmět:
Zdroj: PEARC
DOI: 10.1145/3093338.3104185
Popis: Modern research computing needs at academic institutions are evolving. While traditional HPC has and continues to satisfy most workflows, a new generation of researchers has emerged looking for sophisticated, on-demand, and self-service control of compute infrastructure in a cloud-like environment. Furthermore, many also seek policy-complaint safe spaces to compute on sensitive or protected data.To cater to these modern users, the Minnesota Supercomputing Institute is deploying a cloud service for research computing called Stratus. In its initial iteration, Stratus is designed expressly to satisfy the requirements set forth by the NIH Genomic Data Sharing (GDS) Policy for data from the Database of Genotypes and Phenotypes (dbGaP) [8].Stratus is powered by the Newton version of the OpenStack cloud platform, and backed by Ceph storage. The subscription-based service is currently running in beta-test mode. In addition to data protection and compliance, the service offers three features not available on traditional HPC systems: a) on-demand availability of compute resources; b) long-running jobs (i.e., > 30 days); and c) container-based computing with Docker.This document surveys the design of Stratus with emphasis on security and compliance related to managing dbGaP data. Additionally, we highlight end-user workflows for processing large data in the presence of multi-tiered cloud storage (including a special "dbGaP Cache" for staged data).
Databáze: OpenAIRE