Stochastic Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation

Autor:	Foulds, J., Boyles, L., DuBois, C., Smyth, P., Welling, M., Dhillon, I.S., Koren, Y., Ghani, R., Senator, T.E., Bradley, P., Parekh, R., He, J., Grossman, R.L., Uthurusamy, R.
Přispěvatelé:	Amsterdam Machine Learning lab (IVI, FNWI)
Rok vydání:	2013
Předmět:	Text corpus Topic model FOS: Computer and information sciences business.industry Computer science Inference Bayesian inference Machine learning computer.software_genre Latent Dirichlet allocation Dynamic topic model Machine Learning (cs.LG) Computer Science - Learning symbols.namesake ComputingMethodologies_PATTERNRECOGNITION Variational message passing symbols Artificial intelligence business Representation (mathematics) computer
Zdroj:	KDD KDD '13: the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining: August 11-14, 2013, Chicago, Illinois, USA, 446-454 STARTPAGE=446;ENDPAGE=454;TITLE=KDD '13: the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining: August 11-14, 2013, Chicago, Illinois, USA
DOI:	10.48550/arxiv.1305.2452
Popis:	In the internet era there has been an explosion in the amount of digital text information available, leading to difficulties of scale for traditional inference algorithms for topic models. Recent advances in stochastic variational inference algorithms for latent Dirichlet allocation (LDA) have made it feasible to learn topic models on large-scale corpora, but these methods do not currently take full advantage of the collapsed representation of the model. We propose a stochastic algorithm for collapsed variational Bayesian inference for LDA, which is simpler and more efficient than the state of the art method. We show connections between collapsed variational Bayesian inference and MAP estimation for LDA, and leverage these connections to prove convergence properties of the proposed algorithm. In experiments on large-scale text corpora, the algorithm was found to converge faster and often to a better solution than the previous method. Human-subject experiments also demonstrated that the method can learn coherent topics in seconds on small corpora, facilitating the use of topic models in interactive document analysis software.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::b8bb50d8ad15093bffdd94c8b8630e53 Zobrazit plný text záznamu