ChaperoNet: Distillation of Language Model Semantics to Folded Three-Dimensional Protein Structures

Autor: dos Santos Costa, Allan
Rok vydání: 2021
Druh dokumentu: Diplomová práce
Popis: Determining the structure of proteins has been a long-standing goal in biology. Lan- guage models have been recently deployed to capture the evolutionary semantics of protein sequences, and as an emergent property, were found to be structural learn- ers. Enriched with multiple sequence alignments (MSA), these transformer models were able to capture significant information about a protein’s tertiary structure. In this work, we show how such structural information can be recovered by processing language model embeddings, and introduce a two-stage folding pipeline to directly es- timate three-dimensional folded structures from protein sequences. We envision that this pipeline will provide a basis for efficient, end-to-end protein structure prediction through protein language modeling.
S.M.
Databáze: Networked Digital Library of Theses & Dissertations