Modeling Protein Structure Using Geometric Vector Field Networks*

Autor: Weian Mao, Muzhi Zhu, Hao Chen, Chunhua Shen
Rok vydání: 2023
DOI: 10.1101/2023.05.07.539736
Popis: Proteins serve as the foundation of life, and numerous diseases and challenges in the field of life sciences are intimately linked to the molecular dynamics laws that are concealed within protein structures. In this paper, we propose a novel vector field network (VFN) for modeling the protein structure. Different from previous methods extracting geometry information heavily relying on hand-crafted features such as the distances between atoms, VFN learns to extract geometry information, thus significantly improving the accuracy and applicability.The core idea is that, each residue in VFN maintains a group of hidden geometric vector representations under its residue local frame. When modeling the geometric relationship between two residues, the geometric vector representations of both residues are concatenated to formulate a vector field under one of the residue local frames. Thus, a geometric neural network can be applied to that vector field for extracting geometric information. Consequently, VFN is not only compatible with hand-crafted geometric features, but also discovers other implicit geometric features. Furthermore, the geometric features required for some protein-related tasks are very complex, and hand-crafted features are prone to fail to encode all such information. The introduction of VFN obviates the model’s dependence on such hand-crafted features, thereby rendering VFN particularly amenable and broadly generalizable to these formidable challenges.We evaluate VFN on the protein inverse folding task. The experiment result of the sequence recovery score shows that VFN can significantly improve the performance of the state-of-the-art method, PiFold, by 2.9% (51.7%vs. 54.6%), and outperforms the recent solid baseline, Protein MPNN, by 8.6% (46.0%vs. 54.6%). Furthermore, we scale up VFN with all known protein structure data. Finally, the model achieves a recovery score of57.1%, pushing the accuracy to the next level.
Databáze: OpenAIRE