Inflated expectations: Rare-variant association analysis using public controls.

Autor: Kim J; Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, United States of America., Karyadi DM; Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, United States of America., Hartley SW; Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, United States of America., Zhu B; Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, United States of America., Wang M; Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, United States of America.; Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, Frederick, Maryland, United States of America., Wu D; Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, United States of America.; Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, Frederick, Maryland, United States of America., Song L; Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, United States of America., Armstrong GT; Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, Tennessee, United States of America., Bhatia S; Institute for Cancer Outcomes and Survivorship, University of Alabama at Birmingham, Birmingham, Alabama, United States of America., Robison LL; Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, Tennessee, United States of America., Yasui Y; Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, Tennessee, United States of America., Carter B; Department of Population Science, American Cancer Society, Atlanta, Georgia, United States of America., Sampson JN; Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, United States of America., Freedman ND; Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, United States of America., Goldstein AM; Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, United States of America., Mirabello L; Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, United States of America., Chanock SJ; Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, United States of America., Morton LM; Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, United States of America., Savage SA; Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, United States of America., Stewart DR; Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, United States of America.
Jazyk: angličtina
Zdroj: PloS one [PLoS One] 2023 Jan 25; Vol. 18 (1), pp. e0280951. Date of Electronic Publication: 2023 Jan 25 (Print Publication: 2023).
DOI: 10.1371/journal.pone.0280951
Abstrakt: The use of publicly available sequencing datasets as controls (hereafter, "public controls") in studies of rare variant disease associations has great promise but can increase the risk of false-positive discovery. The specific factors that could contribute to inflated distribution of test statistics have not been systematically examined. Here, we leveraged both public controls, gnomAD v2.1 and several datasets sequenced in our laboratory to systematically investigate factors that could contribute to the false-positive discovery, as measured by λΔ95, a measure to quantify the degree of inflation in statistical significance. Analyses of datasets in this investigation found that 1) the significantly inflated distribution of test statistics decreased substantially when the same variant caller and filtering pipelines were employed, 2) differences in library prep kits and sequencers did not affect the false-positive discovery rate and, 3) joint vs. separate variant-calling of cases and controls did not contribute to the inflation of test statistics. Currently available methods do not adequately adjust for the high false-positive discovery. These results, especially if replicated, emphasize the risks of using public controls for rare-variant association tests in which individual-level data and the computational pipeline are not readily accessible, which prevents the use of the same variant-calling and filtering pipelines on both cases and controls. A plausible solution exists with the emergence of cloud-based computing, which can make it possible to bring containerized analytical pipelines to the data (rather than the data to the pipeline) and could avert or minimize these issues. It is suggested that future reports account for this issue and provide this as a limitation in reporting new findings based on studies that cannot practically analyze all data on a single pipeline.
Competing Interests: All authors have declared no competing interests.
(Copyright: This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.)
Databáze: MEDLINE
Nepřihlášeným uživatelům se plný text nezobrazuje