Evaluation of QSAR Equations for Virtual Screening

Autor: Jacob Spiegel, Hanoch Senderowitz
Rok vydání: 2020
Předmět:
0301 basic medicine
ERG1 Potassium Channel
Support Vector Machine
Databases
Pharmaceutical

Computer science
Drug Evaluation
Preclinical

Quantitative Structure-Activity Relationship
random forest (RF)
computer.software_genre
01 natural sciences
lcsh:Chemistry
Receptor
Serotonin
5-HT2C

lcsh:QH301-705.5
Spectroscopy
QSAR equations
multiple linear regression (MLR)
enrichment optimizer algorithm (EOA)
General Medicine
Computer Science Applications
Random forest
Metric (mathematics)
support vector machine (SVM)
Algorithms
Quantitative structure–activity relationship
enrichment-based optimization
Context (language use)
Machine learning
Article
Catalysis
Inorganic Chemistry
Set (abstract data type)
Quantitative Structure Activity Relationship (QSAR) models
03 medical and health sciences
Receptors
Adrenergic
alpha-2

Molecular descriptor
Linear regression
Humans
Physical and Theoretical Chemistry
Molecular Biology
Receptor
Muscarinic M3

business.industry
Receptors
Dopamine D1

Organic Chemistry
virtual screening (VS)
0104 chemical sciences
Support vector machine
010404 medicinal & biomolecular chemistry
030104 developmental biology
lcsh:Biology (General)
lcsh:QD1-999
Linear Models
multiple linear regression (MLR)
random forest (RF)
support vector machine (SVM)
enrichment optimizer algorithm (EOA)

Artificial intelligence
business
computer
Zdroj: International Journal of Molecular Sciences
International Journal of Molecular Sciences, Vol 21, Iss 7828, p 7828 (2020)
Volume 21
Issue 21
ISSN: 1422-0067
DOI: 10.3390/ijms21217828
Popis: Quantitative Structure Activity Relationship (QSAR) models can inform on the correlation between activities and structure-based molecular descriptors. This information is important for the understanding of the factors that govern molecular properties and for designing new compounds with favorable properties. Due to the large number of calculate-able descriptors and consequently, the much larger number of descriptors combinations, the derivation of QSAR models could be treated as an optimization problem. For continuous responses, metrics which are typically being optimized in this process are related to model performances on the training set, for example, and . Similar metrics, calculated on an external set of data (e.g., ), are used to evaluate the performances of the final models. A common theme of these metrics is that they are context -&rdquo
ignorant&rdquo
In this work we propose that QSAR models should be evaluated based on their intended usage. More specifically, we argue that QSAR models developed for Virtual Screening (VS) should be derived and evaluated using a virtual screening-aware metric, e.g., an enrichment-based metric. To demonstrate this point, we have developed 21 Multiple Linear Regression (MLR) models for seven targets (three models per target), evaluated them first on validation sets and subsequently tested their performances on two additional test sets constructed to mimic small-scale virtual screening campaigns. As expected, we found no correlation between model performances evaluated by &ldquo
classical&rdquo
metrics, e.g., and and the number of active compounds picked by the models from within a pool of random compounds. In particular, in some cases models with favorable and/or values were unable to pick a single active compound from within the pool whereas in other cases, models with poor and/or values performed well in the context of virtual screening. We also found no significant correlation between the number of active compounds correctly identified by the models in the training, validation and test sets. Next, we have developed a new algorithm for the derivation of MLR models by optimizing an enrichment-based metric and tested its performances on the same datasets. We found that the best models derived in this manner showed, in most cases, much more consistent results across the training, validation and test sets and outperformed the corresponding MLR models in most virtual screening tests. Finally, we demonstrated that when tested as binary classifiers, models derived for the same targets by the new algorithm outperformed Random Forest (RF) and Support Vector Machine (SVM)-based models across training/validation/test sets, in most cases. We attribute the better performances of the Enrichment Optimizer Algorithm (EOA) models in VS to better handling of inactive random compounds. Optimizing an enrichment-based metric is therefore a promising strategy for the derivation of QSAR models for classification and virtual screening.
Databáze: OpenAIRE
Nepřihlášeným uživatelům se plný text nezobrazuje