Genetic constraint at single amino acid resolution in protein domains improves missense variant prioritisation and gene discovery.

Autor: Zhang X; National Heart & Lung Institute, Imperial College London, London, UK. xiaolei@ebi.ac.uk.; MRC Laboratory of Medical Sciences, Imperial College London, London, UK. xiaolei@ebi.ac.uk.; Royal Brompton & Harefield Hospitals, Guy's and St. Thomas' NHS Foundation Trust, London, UK. xiaolei@ebi.ac.uk.; Present address: European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK. xiaolei@ebi.ac.uk., Theotokis PI; National Heart & Lung Institute, Imperial College London, London, UK.; MRC Laboratory of Medical Sciences, Imperial College London, London, UK.; Royal Brompton & Harefield Hospitals, Guy's and St. Thomas' NHS Foundation Trust, London, UK., Li N; National Heart & Lung Institute, Imperial College London, London, UK.; MRC Laboratory of Medical Sciences, Imperial College London, London, UK.; Royal Brompton & Harefield Hospitals, Guy's and St. Thomas' NHS Foundation Trust, London, UK., Wright CF; Department of Clinical and Biomedical Sciences, University of Exeter Medical School, Royal Devon & Exeter Hospital, Exeter, UK., Samocha KE; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA., Whiffin N; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA. nwhiffin@well.ox.ac.uk.; Centre for Human Genetics, University of Oxford, Oxford, UK. nwhiffin@well.ox.ac.uk.; Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK. nwhiffin@well.ox.ac.uk., Ware JS; National Heart & Lung Institute, Imperial College London, London, UK. j.ware@imperial.ac.uk.; MRC Laboratory of Medical Sciences, Imperial College London, London, UK. j.ware@imperial.ac.uk.; Royal Brompton & Harefield Hospitals, Guy's and St. Thomas' NHS Foundation Trust, London, UK. j.ware@imperial.ac.uk.; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA. j.ware@imperial.ac.uk.
Jazyk: angličtina
Zdroj: Genome medicine [Genome Med] 2024 Jul 11; Vol. 16 (1), pp. 88. Date of Electronic Publication: 2024 Jul 11.
DOI: 10.1186/s13073-024-01358-9
Abstrakt: Background: One of the major hurdles in clinical genetics is interpreting the clinical consequences associated with germline missense variants in humans. Recent significant advances have leveraged natural variation observed in large-scale human populations to uncover genes or genomic regions that show a depletion of natural variation, indicative of selection pressure. We refer to this as "genetic constraint". Although existing genetic constraint metrics have been demonstrated to be successful in prioritising genes or genomic regions associated with diseases, their spatial resolution is limited in distinguishing pathogenic variants from benign variants within genes.
Methods: We aim to identify missense variants that are significantly depleted in the general human population. Given the size of currently available human populations with exome or genome sequencing data, it is not possible to directly detect depletion of individual missense variants, since the average expected number of observations of a variant at most positions is less than one. We instead focus on protein domains, grouping homologous variants with similar functional impacts to examine the depletion of natural variations within these comparable sets. To accomplish this, we develop the Homologous Missense Constraint (HMC) score. We utilise the Genome Aggregation Database (gnomAD) 125 K exome sequencing data and evaluate genetic constraint at quasi amino-acid resolution by combining signals across protein homologues.
Results: We identify one million possible missense variants under strong negative selection within protein domains. Though our approach annotates only protein domains, it nonetheless allows us to assess 22% of the exome confidently. It precisely distinguishes pathogenic variants from benign variants for both early-onset and adult-onset disorders. It outperforms existing constraint metrics and pathogenicity meta-predictors in prioritising de novo mutations from probands with developmental disorders (DD). It is also methodologically independent of these, adding power to predict variant pathogenicity when used in combination. We demonstrate utility for gene discovery by identifying seven genes newly significantly associated with DD that could act through an altered-function mechanism.
Conclusions: Grouping variants of comparable functional impacts is effective in evaluating their genetic constraint. HMC is a novel and accurate predictor of missense consequence for improved variant interpretation.
(© 2024. The Author(s).)
Databáze: MEDLINE
Nepřihlášeným uživatelům se plný text nezobrazuje