Prediction of toluene/water partition coefficients in the SAMPL9 blind challenge: assessment of machine learning and IEF-PCM/MST continuum solvation models.

Autor: Zamora WJ; CBio3 Laboratory, School of Chemistry, University of Costa Rica, San Pedro, San José, Costa Rica. william.zamoraramirez@ucr.ac.cr.; Laboratory of Computational Toxicology and Artificial Intelligence (LaToxCIA), Biological Testing Laboratory (LEBi), University of Costa Rica, San Pedro, San José, Costa Rica.; Advanced Computing Lab (CNCA), National High Technology Center (CeNAT), Pavas, San José, Costa Rica., Viayna A; Departament de Nutrició, Ciències de l'Alimentació i Gastronomia, Facultat de Farmàcia i Ciències de l'Alimentació, Universitat de Barcelona (UB), Av. Prat de la Riba 171, 08921 Santa Coloma de Gramenet, Spain. fjluque@ub.edu.; Institut de Biomedicina (IBUB), Universitat de Barcelona (UB), Barcelona, Spain.; Institut de Química Teòrica i Computacional (IQTC-UB), Universitat de Barcelona (UB), Barcelona, Spain., Pinheiro S; CBio3 Laboratory, School of Chemistry, University of Costa Rica, San Pedro, San José, Costa Rica. william.zamoraramirez@ucr.ac.cr.; Laboratory of Computational Toxicology and Artificial Intelligence (LaToxCIA), Biological Testing Laboratory (LEBi), University of Costa Rica, San Pedro, San José, Costa Rica., Curutchet C; Institut de Química Teòrica i Computacional (IQTC-UB), Universitat de Barcelona (UB), Barcelona, Spain.; Departament de Farmàcia i Tecnologia Farmacèutica, i Fisicoquímica, Facultat de Farmàcia i Ciències de l'Alimentació, Universitat de Barcelona (UB), Av. Joan XXIII 27-31, 08028, Barcelona, Spain., Bisbal L; Institut de Biomedicina (IBUB), Universitat de Barcelona (UB), Barcelona, Spain.; Departament d'Enginyeria Química i Química Analítica, Universitat de Barcelona (UB), Martí i Franquès 1-11, 08028 Barcelona, Spain. crafols@ub.edu., Ruiz R; Pion Inc., Forest Row Business Park, Forest Row RH18 5DW, UK., Ràfols C; Institut de Biomedicina (IBUB), Universitat de Barcelona (UB), Barcelona, Spain.; Departament d'Enginyeria Química i Química Analítica, Universitat de Barcelona (UB), Martí i Franquès 1-11, 08028 Barcelona, Spain. crafols@ub.edu., Luque FJ; Departament de Nutrició, Ciències de l'Alimentació i Gastronomia, Facultat de Farmàcia i Ciències de l'Alimentació, Universitat de Barcelona (UB), Av. Prat de la Riba 171, 08921 Santa Coloma de Gramenet, Spain. fjluque@ub.edu.; Institut de Biomedicina (IBUB), Universitat de Barcelona (UB), Barcelona, Spain.; Institut de Química Teòrica i Computacional (IQTC-UB), Universitat de Barcelona (UB), Barcelona, Spain.
Jazyk: angličtina
Zdroj: Physical chemistry chemical physics : PCCP [Phys Chem Chem Phys] 2023 Jul 12; Vol. 25 (27), pp. 17952-17965. Date of Electronic Publication: 2023 Jul 12.
DOI: 10.1039/d3cp01428b
Abstrakt: In recent years the use of partition systems other than the widely used biphasic n -octanol/water has received increased attention to gain insight into the molecular features that dictate the lipophilicity of compounds. Thus, the difference between n -octanol/water and toluene/water partition coefficients has proven to be a valuable descriptor to study the propensity of molecules to form intramolecular hydrogen bonds and exhibit chameleon-like properties that modulate solubility and permeability. In this context, this study reports the experimental toluene/water partition coefficients (log  P tol/w ) for a series of 16 drugs that were selected as an external test set in the framework of the Statistical Assessment of the Modeling of Proteins and Ligands (SAMPL) blind challenge. This external set has been used by the computational community to calibrate their methods in the current edition (SAMPL9) of this contest. Furthermore, the study also investigates the performance of two computational strategies for the prediction of log  P tol/w . The first relies on the development of two machine learning (ML) models, which are built up by combining the selection of 11 molecular descriptors in conjunction with either the multiple linear regression (MLR) or the random forest regression (RFR) model to target a dataset of 252 experimental log  P tol/w values. The second consists of the parametrization of the IEF-PCM/MST continuum solvation model from B3LYP/6-31G(d) calculations to predict the solvation free energies of 163 compounds in toluene and benzene. The performance of the ML and IEF-PCM/MST models has been calibrated against external test sets, including the compounds that define the SAMPL9 log  P tol/w challenge. The results are used to discuss the merits and weaknesses of the two computational approaches.
Databáze: MEDLINE