A Novel Algorithm for Rare Disease Gene Prediction Based on Phenotypic Similarity

Autor: Yibo Fan, Jason Flannick
Jazyk: angličtina
Rok vydání: 2021
Předmět:
Zdroj: Journal of the Endocrine Society
ISSN: 2472-1972
Popis: Genetic studies have yielded only a limited number of genes clearly implicated in endocrine disorders, in large part due to two current knowledge gaps. First, genome wide association studies (GWAS) of common diseases have yielded many associations that are hard to translate to causal genes and pathways. Second, whole exome sequencing (WES) studies have transformed diagnosis of rare diseases but often yield many variants of unknown significance that cannot yet be reliably prioritized for disease causality. We hypothesized that phenotypically similar diseases are more likely to share causal genes and pathways. Thus, genes implicated in a (rare or common) disease should be strong candidates to also contribute to a phenotypically similar disease. To test this hypothesis, we aggregated genes (a) for 3,209 rare diseases from OMIM and (b) nearby GWAS signals for 2,316 common diseases from the NHGRI/EBI GWAS catalog. We measured phenotypic similarity based on proximity in the Experimental Factor Ontology (EFO). Across ~2.7M common disease pairs, the number of genes shared increased with phenotypic similarity (Spearman p < 0.1). Similarly, across ~7.4M common and rare disease pairs and ~5.1M rare disease pairs, phenotypic similarity was significantly higher for disease pairs with at least one shared gene compared to those with no shared genes (T-test p < 0.05). We next developed an algorithm to predict genes for a rare disease based on its phenotypic similarity to other diseases and their known genes. Given a rare disease, the algorithm (a) identifies nearby diseases in the EFO; (b) collates their known genes and groups them into gene ontology (GO) terms; and (c) predicts the genes that occur in the most frequently observed GO term as potentially novel disease genes. We evaluated algorithm performance via cross-validation on rare diseases in OMIM. Across 140 rare endocrine diseases, the algorithm predicted on average 4.84 candidate genes with the correct (known but hidden by cross-validation) disease gene within the candidates 23.6% of the time; performance (5.11 candidates, 13.1% success rate) was similar for the other 3,069 rare diseases in OMIM. Examples include Leprechaunism (known gene INSR), for which genes INSR and TWIST2 were predicted based on phenotypic similarity to diseases Barber-Say syndrome, Rabson-Mendenhall syndrome and Gingival fibromatosis-hypertrichosis syndrome. Lubinsky syndrome (no known genes), for which genes ABCD1, LMNA, CNBP were predicted based on phenotypic similarity to diseases Ricker syndrome, X-ALD, DM1, Malouf syndrome, and Noonan syndrome. These data suggest that known phenotypic relationships and disease-gene databases can increase our ability to predict novel genes for less well-studied diseases, potentially speeding the biological translation of GWAS associations for common diseases and increasing the diagnostic yield of WES for rare diseases.
Databáze: OpenAIRE