Experimental Uncertainty in Training Data for Protein-Ligand Binding Affinity Prediction Models

Autor:	Carlos A. Hernández-Garrido, Norberto Sánchez-Cruz
Jazyk:	angličtina
Rok vydání:	2023
Předmět:	Binding affinity prediction uncertainty estimation machine-learning Science (General) Q1-390
Zdroj:	Artificial Intelligence in the Life Sciences, Vol 4, Iss , Pp 100087- (2023)
Druh dokumentu:	article
ISSN:	2667-3185
DOI:	10.1016/j.ailsci.2023.100087
Popis:	The accuracy of machine learning models for protein-ligand binding affinity prediction depends on the quality of the experimental data they are trained on. Most of these models are trained and tested on different subsets of the PDBbind database, which is the main source of protein-ligand complexes with annotated binding affinity in the public domain. However, estimating its experimental uncertainty is not straightforward because just a few protein-ligand complexes have more than one measurement associated. In this work, we analyze bioactivity data from ChEMBL to estimate the experimental uncertainty associated with the three binding affinity measures included in the PDBbind (Ki, Kd, and IC50), as well as the effect of combining them. The experimental uncertainty of combining these three affinity measures was characterized by a mean absolute error of 0.78 logarithmic units, a root mean square error of 1.04 and a Pearson correlation coefficient of 0.76. These estimations were contrasted with the performances obtained by state-of-the-art machine learning models for binding affinity prediction, showing that these models tend to be overoptimistic when evaluated on the core set from PDBbind.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/b2b50eea9d7a47edb99c8274fcadb9f7 Zobrazit plný text záznamu View record in DOAJ