From Proteins to Ligands: Decoding Deep Learning Methods for Binding Affinity Prediction.

Autor: Gorantla R; School of Informatics, University of Edinburgh, Edinburgh, EH8 9AB, U.K.; EaStCHEM School of Chemistry, University of Edinburgh, Edinburgh, EH9 3FJ, U.K., Kubincová A; Exscientia, Schrödinger Building, Oxford, OX4 4GE, U.K., Weiße AY; School of Informatics, University of Edinburgh, Edinburgh, EH8 9AB, U.K.; School of Biological Sciences, University of Edinburgh, Edinburgh, EH9 3FF, U.K., Mey ASJS; EaStCHEM School of Chemistry, University of Edinburgh, Edinburgh, EH9 3FJ, U.K.
Jazyk: angličtina
Zdroj: Journal of chemical information and modeling [J Chem Inf Model] 2024 Apr 08; Vol. 64 (7), pp. 2496-2507. Date of Electronic Publication: 2023 Nov 20.
DOI: 10.1021/acs.jcim.3c01208
Abstrakt: Accurate in silico prediction of protein-ligand binding affinity is important in the early stages of drug discovery. Deep learning-based methods exist but have yet to overtake more conventional methods such as giga-docking largely due to their lack of generalizability. To improve generalizability, we need to understand what these models learn from input protein and ligand data. We systematically investigated a sequence-based deep learning framework to assess the impact of protein and ligand encodings on predicting binding affinities for commonly used kinase data sets. The role of proteins is studied using convolutional neural network-based encodings obtained from sequences and graph neural network-based encodings enriched with structural information from contact maps. Ligand-based encodings are generated from graph-neural networks. We test different ligand perturbations by randomizing node and edge properties. For proteins, we make use of 3 different protein contact generation methods (AlphaFold2, Pconsc4, and ESM-1b) and compare these with a random control. Our investigation shows that protein encodings do not substantially impact the binding predictions, with no statistically significant difference in binding affinity for KIBA in the investigated metrics (concordance index, Pearson's R Spearman's Rank, and RMSE). Significant differences are seen for ligand encodings with random ligands and random ligand node properties, suggesting a much bigger reliance on ligand data for the learning tasks. Using different ways to combine protein and ligand encodings did not show a significant change in performance.
Databáze: MEDLINE