Variational posterior approximation using stochastic gradient ascent with adaptive stepsize
Autor: | Xudong Jiang, Kart-Leong Lim |
---|---|
Rok vydání: | 2021 |
Předmět: |
Computer science
MathematicsofComputing_NUMERICALANALYSIS Inference 02 engineering and technology Adaptive stepsize 01 natural sciences Momentum symbols.namesake ComputingMethodologies_PATTERNRECOGNITION Artificial Intelligence 0103 physical sciences Signal Processing 0202 electrical engineering electronic engineering information engineering symbols Applied mathematics 020201 artificial intelligence & image processing Computer Vision and Pattern Recognition Closed-form expression 010306 general physics Fisher information Focus (optics) Gradient descent Software |
Zdroj: | Pattern Recognition. 112:107783 |
ISSN: | 0031-3203 |
DOI: | 10.1016/j.patcog.2020.107783 |
Popis: | Scalable algorithms of variational posterior approximation allow Bayesian nonparametrics such as Dirichlet process mixture to scale up to larger dataset at fractional cost. Recent algorithms, notably the stochastic variational inference performs local learning from minibatch. The main problem with stochastic variational inference is that it relies on closed form solution. Stochastic gradient ascent is a modern approach to machine learning and is widely deployed in the training of deep neural networks. In this work, we explore using stochastic gradient ascent as a fast algorithm for the posterior approximation of Dirichlet process mixture. However, stochastic gradient ascent alone is not optimal for learning. In order to achieve both speed and performance, we turn our focus to stepsize optimization in stochastic gradient ascent. As as intermediate approach, we first optimize stepsize using the momentum method. Finally, we introduce Fisher information to allow adaptive stepsize in our posterior approximation. In the experiments, we justify that our approach using stochastic gradient ascent do not sacrifice performance for speed when compared to closed form coordinate ascent learning on these datasets. Lastly, our approach is also compatible with deep ConvNet features as well as scalable to large class datasets such as Caltech256 and SUN397. |
Databáze: | OpenAIRE |
Externí odkaz: |