Stochastic gradient Hamiltonian Monte Carlo with variance reduction for Bayesian inference
Autor: | Jian Li, Zhize Li, Jun Zhu, Tianyi Zhang, Shuyu Cheng |
---|---|
Rok vydání: | 2019 |
Předmět: |
FOS: Computer and information sciences
Computer Science - Machine Learning ComputingMethodologies_SIMULATIONANDMODELING Monte Carlo method Machine Learning (stat.ML) 02 engineering and technology Variance (accounting) Bayesian inference Machine Learning (cs.LG) Hybrid Monte Carlo Statistics - Machine Learning Artificial Intelligence 020204 information systems Computer Science - Data Structures and Algorithms Convergence (routing) 0202 electrical engineering electronic engineering information engineering Data Structures and Algorithms (cs.DS) 020201 artificial intelligence & image processing Variance reduction Statistical physics Langevin dynamics Bayesian linear regression Software Mathematics |
Zdroj: | Machine Learning. 108:1701-1727 |
ISSN: | 1573-0565 0885-6125 |
DOI: | 10.1007/s10994-019-05825-y |
Popis: | Gradient-based Monte Carlo sampling algorithms, like Langevin dynamics and Hamiltonian Monte Carlo, are important methods for Bayesian inference. In large-scale settings, full-gradients are not affordable and thus stochastic gradients evaluated on mini-batches are used as a replacement. In order to reduce the high variance of noisy stochastic gradients, Dubey et al. [2016] applied the standard variance reduction technique on stochastic gradient Langevin dynamics and obtained both theoretical and experimental improvements. In this paper, we apply the variance reduction tricks on Hamiltonian Monte Carlo and achieve better theoretical convergence results compared with the variance-reduced Langevin dynamics. Moreover, we apply the symmetric splitting scheme in our variance-reduced Hamiltonian Monte Carlo algorithms to further improve the theoretical results. The experimental results are also consistent with the theoretical results. As our experiment shows, variance-reduced Hamiltonian Monte Carlo demonstrates better performance than variance-reduced Langevin dynamics in Bayesian regression and classification tasks on real-world datasets. 25 pages |
Databáze: | OpenAIRE |
Externí odkaz: |