Evaluation of QSAR Equations for Virtual Screening
Autor: | Jacob Spiegel, Hanoch Senderowitz |
---|---|
Rok vydání: | 2020 |
Předmět: |
0301 basic medicine
ERG1 Potassium Channel Support Vector Machine Databases Pharmaceutical Computer science Drug Evaluation Preclinical Quantitative Structure-Activity Relationship random forest (RF) computer.software_genre 01 natural sciences lcsh:Chemistry Receptor Serotonin 5-HT2C lcsh:QH301-705.5 Spectroscopy QSAR equations multiple linear regression (MLR) enrichment optimizer algorithm (EOA) General Medicine Computer Science Applications Random forest Metric (mathematics) support vector machine (SVM) Algorithms Quantitative structure–activity relationship enrichment-based optimization Context (language use) Machine learning Article Catalysis Inorganic Chemistry Set (abstract data type) Quantitative Structure Activity Relationship (QSAR) models 03 medical and health sciences Receptors Adrenergic alpha-2 Molecular descriptor Linear regression Humans Physical and Theoretical Chemistry Molecular Biology Receptor Muscarinic M3 business.industry Receptors Dopamine D1 Organic Chemistry virtual screening (VS) 0104 chemical sciences Support vector machine 010404 medicinal & biomolecular chemistry 030104 developmental biology lcsh:Biology (General) lcsh:QD1-999 Linear Models multiple linear regression (MLR) random forest (RF) support vector machine (SVM) enrichment optimizer algorithm (EOA) Artificial intelligence business computer |
Zdroj: | International Journal of Molecular Sciences International Journal of Molecular Sciences, Vol 21, Iss 7828, p 7828 (2020) Volume 21 Issue 21 |
ISSN: | 1422-0067 |
DOI: | 10.3390/ijms21217828 |
Popis: | Quantitative Structure Activity Relationship (QSAR) models can inform on the correlation between activities and structure-based molecular descriptors. This information is important for the understanding of the factors that govern molecular properties and for designing new compounds with favorable properties. Due to the large number of calculate-able descriptors and consequently, the much larger number of descriptors combinations, the derivation of QSAR models could be treated as an optimization problem. For continuous responses, metrics which are typically being optimized in this process are related to model performances on the training set, for example, and . Similar metrics, calculated on an external set of data (e.g., ), are used to evaluate the performances of the final models. A common theme of these metrics is that they are context -&rdquo ignorant&rdquo In this work we propose that QSAR models should be evaluated based on their intended usage. More specifically, we argue that QSAR models developed for Virtual Screening (VS) should be derived and evaluated using a virtual screening-aware metric, e.g., an enrichment-based metric. To demonstrate this point, we have developed 21 Multiple Linear Regression (MLR) models for seven targets (three models per target), evaluated them first on validation sets and subsequently tested their performances on two additional test sets constructed to mimic small-scale virtual screening campaigns. As expected, we found no correlation between model performances evaluated by &ldquo classical&rdquo metrics, e.g., and and the number of active compounds picked by the models from within a pool of random compounds. In particular, in some cases models with favorable and/or values were unable to pick a single active compound from within the pool whereas in other cases, models with poor and/or values performed well in the context of virtual screening. We also found no significant correlation between the number of active compounds correctly identified by the models in the training, validation and test sets. Next, we have developed a new algorithm for the derivation of MLR models by optimizing an enrichment-based metric and tested its performances on the same datasets. We found that the best models derived in this manner showed, in most cases, much more consistent results across the training, validation and test sets and outperformed the corresponding MLR models in most virtual screening tests. Finally, we demonstrated that when tested as binary classifiers, models derived for the same targets by the new algorithm outperformed Random Forest (RF) and Support Vector Machine (SVM)-based models across training/validation/test sets, in most cases. We attribute the better performances of the Enrichment Optimizer Algorithm (EOA) models in VS to better handling of inactive random compounds. Optimizing an enrichment-based metric is therefore a promising strategy for the derivation of QSAR models for classification and virtual screening. |
Databáze: | OpenAIRE |
Externí odkaz: | |
Nepřihlášeným uživatelům se plný text nezobrazuje | K zobrazení výsledku je třeba se přihlásit. |