A comparison of molecular representations for lipophilicity quantitative structure-property relationships with results from the SAMPL6 logP Prediction Challenge
Autor: | Slade Matthews, Davy Guan, Raymond Lui |
---|---|
Rok vydání: | 2019 |
Předmět: |
Multilinear map
Quantitative structure–activity relationship 010304 chemical physics Property (programming) Quantitative Structure-Activity Relationship Water 01 natural sciences Chemical space Regression 0104 chemical sciences Computer Science Applications 010404 medicinal & biomolecular chemistry Models Chemical Solubility 0103 physical sciences Drug Discovery Lipophilicity Feature (machine learning) Physical and Theoretical Chemistry Representation (mathematics) Biological system Protein Kinase Inhibitors Protein Kinases Mathematics |
Zdroj: | Journal of computer-aided molecular design. 34(5) |
ISSN: | 1573-4951 |
Popis: | Effective representation of a molecule is required to develop useful quantitative structure-property relationships (QSPR) for accurate prediction of chemical properties. The octanol-water partition coefficient logP, a measure of lipophilicity, is an important property for pharmacological and toxicological endpoints used in the pharmaceutical and regulatory spheres. We compare physicochemical descriptors, structural keys, and circular fingerprints in their ability to effectively represent a chemical space and characterise molecular features to correlate with lipophilicity. Exploratory landscape continuity analyses revealed that whole-molecule physicochemical descriptors could map together compounds that were similar in both molecular features and logP, indicating higher potential for use in logP QSPRs compared to the substructural approach of structural keys and circular fingerprints. Indeed, logP QSPR models parameterised by physicochemical descriptors consistently performed with the lowest error. Our best performing model was a stochastic gradient descent-optimised multilinear regression with 1438 descriptors, returning an internal benchmark RMSE of 1.03 log units. This corroborates the well-established notion that lipophilicity is an additive, whole-molecule property. We externally tested the model by participating in the 2019 SAMPL6 logP Prediction Challenge and blindly predicting for 11 protein kinase inhibitor fragment-like molecules. Our model returned an RMSE of 0.49 log units, placing eighth overall and third in the empirical methods category (submission ID 'hdpuj'). Permutation feature importance analyses revealed that physicochemical descriptors could characterise predictive molecular features highly relevant to the kinase inhibitor fragment-like molecules. |
Databáze: | OpenAIRE |
Externí odkaz: |