EraSOR: a software tool to eliminate inflation caused by sample overlap in polygenic score analyses.

Autor: Choi SW; Department of Genetics and Genomic Sciences, Icahn School of Medicine, Mount Sinai, New York City, NY 10029, USA.; MRC Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, SE5 8AF, UK., Mak TSH; Centre of Genomic Sciences, University of Hong Kong, Pokfulam, Hong Kong SAR, China., Hoggart CJ; Department of Genetics and Genomic Sciences, Icahn School of Medicine, Mount Sinai, New York City, NY 10029, USA., O'Reilly PF; Department of Genetics and Genomic Sciences, Icahn School of Medicine, Mount Sinai, New York City, NY 10029, USA.; MRC Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, SE5 8AF, UK.
Jazyk: angličtina
Zdroj: GigaScience [Gigascience] 2022 Dec 28; Vol. 12. Date of Electronic Publication: 2023 Jun 16.
DOI: 10.1093/gigascience/giad043
Abstrakt: Background: Polygenic risk score (PRS) analyses are now routinely applied across biomedical research. However, as PRS studies grow in size, there is an increased risk of sample overlap between the genome-wide association study (GWAS) from which the PRS is derived and the "target sample," in which PRSs are computed and hypotheses are tested. Despite the wide recognition of the sample overlap problem, its potential impact on the results from PRS studies has not yet been quantified, and no analytical solution has been provided.
Findings: Here, we first conduct a comprehensive investigation into the scale of the sample overlap problem, finding that PRS results can be substantially inflated even in the presence of minimal overlap. Next, we introduce a method and software, EraSOR (Erase Sample Overlap and Relatedness), which eliminates the inflation caused by sample overlap (and close relatedness) in almost all settings tested here.
Conclusions: EraSOR could be useful in PRS studies (with target sample >1,000) similar to those investigated here, either (i) to mitigate the potential effects of known or unknown intercohort overlap and close relatedness or (ii) as a sensitivity tool to highlight the possible presence of sample overlap before its direct removal, when possible, or else to provide a lower bound on PRS analysis results after accounting for potential sample overlap.
(© The Author(s) 2023. Published by Oxford University Press GigaScience.)
Databáze: MEDLINE
Nepřihlášeným uživatelům se plný text nezobrazuje