sPLINK: A Federated, Privacy-Preserving Tool as a Robust Alternative to Meta-Analysis in Genome-Wide Association Studies
Autor: | Reza Nasirigerdeh, Reihaneh Torkzadehmahani, Julian Matschinske, Tobias Frisch, Markus List, Julian Späth, Stefan Weiß, Uwe Völker, Dominik Heider, Nina Kerstin Wenke, Tim Kacprowski, Jan Baumbach |
---|---|
Rok vydání: | 2020 |
Předmět: |
0303 health sciences
020205 medical informatics Computer science Genome-wide association study 02 engineering and technology Gold standard (test) computer.software_genre Summary statistics Set (abstract data type) 03 medical and health sciences Sample size determination Meta-analysis 0202 electrical engineering electronic engineering information engineering Data mining Raw data computer 030304 developmental biology Genetic association |
Zdroj: | bioRxiv |
DOI: | 10.1101/2020.06.05.136382 |
Popis: | Genome-wide association studies (GWAS) have been widely used to unravel connections between genetic variants and diseases. Larger sample sizes in GWAS can lead to discovering more associations and more accurate genetic predictors. However, sharing and combining distributed genomic data to increase the sample size is often challenging or even impossible due to privacy concerns and privacy protection laws such as the GDPR. While meta-analysis has been established as an effective approach to combine summary statistics of several GWAS, its accuracy can be attenuated in the presence of cross-study heterogeneity. Here, we presentsPLINK(safe PLINK), a user-friendly tool, which performs federated GWAS on distributed datasets while preserving the privacy of data and the accuracy of the results.sPLINKneither exchanges raw data nor does it rely on summary statistics. Instead, it performs model training in a federated manner, communicating only model parameters between cohorts and a central server. We verify that the federated results fromsPLINKare the same as those from aggregated analyses conducted withPLINK. We demonstrate thatsPLINKis robust against heterogeneous data (phenotype and confounding factors) distributions across cohorts while existing meta-analysis tools considerably lose accuracy in such scenarios. We also show thatsPLINKachieves practical runtime, in order of minutes or hours, and acceptable network bandwidth consumption for chi-square and linear/logistic regression tests. Federated analysis withsPLINK, thus, has the potential to replace meta-analysis as the gold standard for collaborative GWAS. The user-friendly, readily usablesPLINKtool is available athttps://exbio.wzw.tum.de/splink. |
Databáze: | OpenAIRE |
Externí odkaz: |