FAIR privacy-preserving operation of large genomic variant calling format (VCF) data without download or installation.

Autor: Martins YC; National Laboratory of Scientific Computing, Petrópolis, Brazil., Bhawsar PM; Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD 20850, USA., Balasubramanian JB; Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD 20850, USA., Russ D; Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD 20850, USA., Wong WS; Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD 20850, USA., Maass W; Saarland University, 66123 Saarbrücken, Germany., Almeida JS; Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD 20850, USA.
Jazyk: angličtina
Zdroj: AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science [AMIA Jt Summits Transl Sci Proc] 2024 May 31; Vol. 2024, pp. 65-74. Date of Electronic Publication: 2024 May 31 (Print Publication: 2024).
Abstrakt: Motivation : The proliferation of genetic testing and consumer genomics represents a logistic challenge to the personalized use of GWAS data in VCF format. Specifically, the challenge of retrieving target genetic variation from large compressed files filled with unrelated variation information. Compounding the data traversal challenge, privacy-sensitive VCF files are typically managed as large stand-alone single files (no companion index file) composed of variable-sized compressed chunks, hosted in consumer-facing environments with no native support for hosted execution. Results : A portable JavaScript module was developed to support in-browser fetching of partial content using byte-range requests. This includes on-the-fly decompressing irregularly positioned compressed chunks, coupled with a binary search algorithm iteratively identifying chromosome-position ranges. The in-browser zero-footprint solution (no downloads, no installations) enables the interoperability, reusability, and user-facing governance advanced by the FAIR principles for stewardship of scientific data. Availability - https://episphere.github.io/vcf, including supplementary material.
(©2024 AMIA - All rights reserved.)
Databáze: MEDLINE