Autor: |
Yin R; Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115.; Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL 32610., Gutierrez A; Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115., Kobren SN; Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115., Avillach P; Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115. |
Jazyk: |
angličtina |
Zdroj: |
MedRxiv : the preprint server for health sciences [medRxiv] 2024 Apr 20. Date of Electronic Publication: 2024 Apr 20. |
DOI: |
10.1101/2024.04.15.24305876 |
Abstrakt: |
Rare and ultra-rare genetic conditions are estimated to impact nearly 1 in 17 people worldwide, yet accurately pinpointing the diagnostic variants underlying each of these conditions remains a formidable challenge. Because comprehensive, in vivo functional assessment of all possible genetic variants is infeasible, clinicians instead consider in silico variant pathogenicity predictions to distinguish plausibly disease-causing from benign variants across the genome. However, in the most difficult undiagnosed cases, such as those accepted to the Undiagnosed Diseases Network (UDN), existing pathogenicity predictions cannot reliably discern true etiological variant(s) from other deleterious candidate variants that were prioritized through N-of-1 efforts. Pinpointing the disease-causing variant from a pool of plausible candidates remains a largely manual effort requiring extensive clinical workups, functional and experimental assays, and eventual identification of genotype- and phenotype-matched individuals. Here, we introduce VarPPUD, a tool trained on prioritized variants from UDN cases, that leverages gene-, amino acid-, and nucleotide-level features to discern pathogenic variants from other deleterious variants that are unlikely to be confirmed as disease relevant. VarPPUD achieves a cross-validated accuracy of 79.3% and precision of 77.5% on a held-out subset of uniquely challenging UDN cases, respectively representing an average 18.6% and 23.4% improvement over nine traditional pathogenicity prediction approaches on this task. We validate VarPPUD's ability to discriminate likely from unlikely pathogenic variants on synthetic, GAN-generated candidate variants as well. Finally, we show how VarPPUD can be probed to evaluate each input feature's importance and contribution toward prediction-an essential step toward understanding the distinct characteristics of newly-uncovered disease-causing variants. |
Databáze: |
MEDLINE |
Externí odkaz: |
|