Bayesian Model Averaging for Ensemble-Based Estimates of Solvation Free Energies

Autor: Christopher C. Overall, Sarah M. Reehl, Paul D. Whitney, Luke J. Gosink, David L. Mobley, Nathan A. Baker
Jazyk: angličtina
Rok vydání: 2016
Předmět:
FOS: Computer and information sciences
Work (thermodynamics)
FOS: Physical sciences
Bayesian inference
Ligands
01 natural sciences
Statistics - Applications
Article
Bayes' theorem
q-bio.BM
Engineering
Computational chemistry
0103 physical sciences
Materials Chemistry
Range (statistics)
Applications (stat.AP)
Statistical physics
Physical and Theoretical Chemistry
Physics::Chemical Physics
stat.AP
Statistical ensemble
010304 chemical physics
Chemistry
Solvation
Proteins
Statistical model
Biomolecules (q-bio.BM)
Bayes Theorem
Computational Physics (physics.comp-ph)
Statistical process control
0104 chemical sciences
Surfaces
Coatings and Films

010404 medicinal & biomolecular chemistry
Solubility
Quantitative Biology - Biomolecules
physics.comp-ph
FOS: Biological sciences
Chemical Sciences
Physical Sciences
Solvents
Quantum Theory
Thermodynamics
Physics - Computational Physics
Zdroj: Gosink, LJ; Overall, CC; Reehl, SM; Whitney, PD; Mobley, DL; & Baker, NA. (2017). Bayesian Model Averaging for Ensemble-Based Estimates of Solvation-Free Energies. JOURNAL OF PHYSICAL CHEMISTRY B, 121(15), 3458-3472. doi: 10.1021/acs.jpcb.6b09198. UC Irvine: Retrieved from: http://www.escholarship.org/uc/item/66r0m84w
The journal of physical chemistry. B, vol 121, iss 15
Popis: This paper applies the Bayesian Model Averaging (BMA) statistical ensemble technique to estimate small molecule solvation free energies. There is a wide range of methods available for predicting solvation free energies, ranging from empirical statistical models to ab initio quantum mechanical approaches. Each of these methods is based on a set of conceptual assumptions that can affect predictive accuracy and transferability. Using an iterative statistical process, we have selected and combined solvation energy estimates using an ensemble of 17 diverse methods from the fourth Statistical Assessment of Modeling of Proteins and Ligands (SAMPL) blind prediction study to form a single, aggregated solvation energy estimate. The ensemble design process evaluates the statistical information in each individual method as well as the performance of the aggregate estimate obtained from the ensemble as a whole. Methods that possess minimal or redundant information are pruned from the ensemble and the evaluation process repeats until aggregate predictive performance can no longer be improved. We show that this process results in a final aggregate estimate that outperforms all individual methods by reducing estimate errors by as much as 91% to 1.2 kcal/mol accuracy. We also compare our iterative refinement approach to other statistical ensemble approaches and demonstrate that this iterative process reduces estimate errors by as much as 61%. This work provides a new approach for accurate solvation free energy prediction and lays the foundation for future work on aggregate models that can balance computational cost with prediction accuracy.
Databáze: OpenAIRE