ProALIGN: Directly Learning Alignments for Protein Structure Prediction via Exploiting Context-Specific Alignment Motifs

Autor:	Lupeng Kong, Fusong Ju, Wei-Mou Zheng, Jianwei Zhu, Shiwei Sun, Jinbo Xu, Dongbo Bu
Jazyk:	angličtina
Rok vydání:	2022
Předmět:	Models Molecular Computer science Protein Conformation Amino Acid Motifs Protein structure Deep Learning Sequence Analysis Protein Genetics Homology modeling Amino Acid Sequence Molecular Biology Research Articles business.industry A protein Computational Biology Proteins Pattern recognition Protein structure prediction Computational Mathematics Computational Theory and Mathematics Modeling and Simulation Artificial intelligence Neural Networks Computer Threading (protein sequence) business Sequence motif Sequence Alignment Algorithms Software
Zdroj:	J Comput Biol
Popis:	Template-based modeling (TBM), including homology modeling and protein threading, is one of the most reliable techniques for protein structure prediction. It predicts protein structure by building an alignment between the query sequence under prediction and the templates with solved structures. However, it is still very challenging to build the optimal sequence-template alignment, especially when only distantly-related templates are available. Here we report a novel deep learning approach ProALIGN that can predict much more accurate sequence-template alignment. Like protein sequences consisting of sequence motifs, protein alignments are also composed of frequently-occurring alignment motifs with characteristic patterns. Alignment motifs are context-specific as their characteristic patterns are tightly related to sequence contexts of the aligned regions. Inspired by this observation, we represent a protein alignment as a binary matrix (in which 1 denotes an aligned residue pair) and then use a deep convolutional neural network to predict the optimal alignment from the query protein and its template. The trained neural network implicitly but effectively encodes an alignment scoring function, which reduces inaccuracies in the handcrafted scoring functions widely used by the current threading approaches. For a query protein and a template, we apply the neural network to directly infer likelihoods of all possible residue pairs in their entirety, which could effectively consider the correlations among multiple residues. We further construct the alignment with maximum likelihood, and finally build structure model according to the alignment.Tested on three independent datasets with in total 6,688 protein alignment targets and 80 CASP13 TBM targets, our method achieved much better alignments and 3D structure models than the existing methods including HHpred, CNFpred, CEthreader and DeepThreader. These results clearly demonstrate the effectiveness of exploiting the context-specific alignment motifs by deep learning for protein threading.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::c8ead283d08885815bd136b2d28a3447 https://europepmc.org/articles/PMC8892980/ Zobrazit plný text záznamu