GSNs: generative stochastic networks

Autor:	Guillaume Alain, Yoshua Bengio, Jason Yosinski, Éric Thibodeau-Laufer, Saizheng Zhang, Pascal Vincent, Li Yao
Rok vydání:	2016
Předmět:	FOS: Computer and information sciences Statistics and Probability Pseudolikelihood Boltzmann machine 02 engineering and technology 010501 environmental sciences Machine learning computer.software_genre 01 natural sciences Machine Learning (cs.LG) symbols.namesake Joint probability distribution 0202 electrical engineering electronic engineering information engineering 0105 earth and related environmental sciences Mathematics Numerical Analysis Markov chain business.industry Applied Mathematics Probabilistic logic Conditional probability distribution Computer Science - Learning Recurrent neural network Computational Theory and Mathematics symbols 020201 artificial intelligence & image processing Artificial intelligence business computer Algorithm Analysis Gibbs sampling
Zdroj:	Information and Inference. 5:210-249
ISSN:	2049-8772 2049-8764
DOI:	10.1093/imaiai/iaw003
Popis:	We introduce a novel training principle for probabilistic models that is an alternative to maximum likelihood. The proposed Generative Stochastic Networks (GSN) framework is based on learning the transition operator of a Markov chain whose stationary distribution estimates the data distribution. Because the transition distribution is a conditional distribution generally involving a small move, it has fewer dominant modes, being unimodal in the limit of small moves. Thus, it is easier to learn, more like learning to perform supervised function approximation, with gradients that can be obtained by back-propagation. The theorems provided here generalize recent work on the probabilistic interpretation of denoising auto-encoders and provide an interesting justification for dependency networks and generalized pseudolikelihood (along with defining an appropriate joint distribution and sampling mechanism, even when the conditionals are not consistent). We study how GSNs can be used with missing inputs and can be used to sample subsets of variables given the rest. Successful experiments are conducted, validating these theoretical results, on two image datasets and with a particular architecture that mimics the Deep Boltzmann Machine Gibbs sampler but allows training to proceed with backprop, without the need for layerwise pretraining. Comment: arXiv admin note: substantial text overlap with arXiv:1306.1091
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::02502a47f22b774510d75d4989e43180 https://doi.org/10.1093/imaiai/iaw003 Zobrazit plný text záznamu Plný text ve formátu PDF