Improved Pathogenic Variant Localization via a Hierarchical Model of Sub-regional Intolerance
Autor: | Sitharthan Kamalakaran, Brett Copeland, Andrew S. Allen, Charles J. Wolock, Tristan J. Hayeck, David Goldstein, Nicholas Stong |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2019 |
Předmět: |
0303 health sciences
Models Genetic Bayes Theorem Computational biology Disease Exons Biology Disease cluster Hierarchical database model Article 03 medical and health sciences Negative selection 0302 clinical medicine Gene Components Genetic variation Mutation Genetics OMIM : Online Mendelian Inheritance in Man Bayesian hierarchical modeling Humans Gene Spasms Infantile 030217 neurology & neurosurgery Genetics (clinical) 030304 developmental biology |
Popis: | Different parts of a gene can be of differential importance to development and health. This regional heterogeneity is also apparent in the distribution of disease-associated mutations, which often cluster in particular regions of disease-associated genes. The ability to precisely estimate functionally important sub-regions of genes will be key in correctly deciphering relationships between genetic variation and disease. Previous methods have had some success using standing human variation to characterize this variability in importance by measuring sub-regional intolerance, i.e., the depletion in functional variation from expectation within a given region of a gene. However, the ability to precisely estimate local intolerance was restricted by the fact that only information within a given sub-region is used, leading to instability in local estimates, especially for small regions. We show that borrowing information across regions using a Bayesian hierarchical model stabilizes estimates, leading to lower variability and improved predictive utility. Specifically, our approach more effectively identifies regions enriched for ClinVar pathogenic variants. We also identify significant correlations between sub-region intolerance and the distribution of pathogenic variation in disease-associated genes, with AUCs for classifying de novo missense variants in Online Mendelian Inheritance in Man (OMIM) genes of up to 0.86 using exonic sub-regions and 0.91 using sub-regions defined by protein domains. This result immediately suggests that considering the intolerance of regions in which variants are found may improve diagnostic interpretation. We also illustrate the utility of integrating regional intolerance into gene-level disease association tests with a study of known disease-associated genes for epileptic encephalopathy. |
Databáze: | OpenAIRE |
Externí odkaz: |