Deep generative modeling of the human proteome reveals over a hundred novel genes involved in rare genetic disorders.

Autor: Orenbuch R; Marks Group, Department of Systems Biology, Harvard Medical School, Boston, MA, USA., Kollasch AW; Marks Group, Department of Systems Biology, Harvard Medical School, Boston, MA, USA., Spinner HD; Marks Group, Department of Systems Biology, Harvard Medical School, Boston, MA, USA., Shearer CA; Marks Group, Department of Systems Biology, Harvard Medical School, Boston, MA, USA., Hopf TA; Scientific Consulting, 85435 Erding, Germany., Franceschi D; Marks Group, Department of Systems Biology, Harvard Medical School, Boston, MA, USA., Dias M; Dias & Frazer Group, Centre for Genomic Regulation (CRG),The Barcelona Institute of Science and Technology, Barcelona, Spain.; University Pompeu Fabra, Barcelona, Spain., Frazer J; Dias & Frazer Group, Centre for Genomic Regulation (CRG),The Barcelona Institute of Science and Technology, Barcelona, Spain.; University Pompeu Fabra, Barcelona, Spain., Marks DS; Marks Group, Department of Systems Biology, Harvard Medical School, Boston, MA, USA.; Broad Institute of MIT and Harvard, Cambridge, MA, USA.
Jazyk: angličtina
Zdroj: Research square [Res Sq] 2024 Jan 04. Date of Electronic Publication: 2024 Jan 04.
DOI: 10.21203/rs.3.rs-3740259/v1
Abstrakt: Identifying causal mutations accelerates genetic disease diagnosis, and therapeutic development. Missense variants present a bottleneck in genetic diagnoses as their effects are less straightforward than truncations or nonsense mutations. While computational prediction methods are increasingly successful at prediction for variants in known disease genes, they do not generalize well to other genes as the scores are not calibrated across the proteome 1-6 . To address this, we developed a deep generative model, popEVE, that combines evolutionary information with population sequence data 7 and achieves state-of-the-art performance at ranking variants by severity to distinguish patients with severe developmental disorders 8 from potentially healthy individuals 9 . popEVE identifies 442 genes in patients this developmental disorder cohort, including evidence of 123 novel genetic disorders, many without the need for gene-level enrichment and without overestimating the prevalence of pathogenic variants in the population. A majority of these variants are close to interacting partners in 3D complexes. Preliminary analyses on child exomes indicate that popEVE can identify candidate variants without the need for inheritance labels. By placing variants on a unified scale, our model offers a comprehensive perspective on the distribution of fitness effects across the entire proteome and the broader human population. popEVE provides compelling evidence for genetic diagnoses even in exceptionally rare single-patient disorders where conventional techniques relying on repeated observations may not be applicable.
Competing Interests: Additional Declarations: There is NO Competing Interest.
Databáze: MEDLINE