Learning Disentangled Representation in Latent Stochastic Models: A Case Study with Image Captioning
Autor: | Lalitesh Morishetti, Eduard Hovy, Sai Krishna Rallabandi, Alan W. Black, Nidhi Vyas |
---|---|
Rok vydání: | 2019 |
Předmět: |
Closed captioning
Training set Computer science business.industry Stochastic modelling 02 engineering and technology Latent variable 010501 environmental sciences Machine learning computer.software_genre 01 natural sciences Task (project management) Image (mathematics) Prior probability 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Artificial intelligence business Representation (mathematics) Feature learning computer 0105 earth and related environmental sciences |
Zdroj: | ICASSP |
DOI: | 10.1109/icassp.2019.8683370 |
Popis: | Multimodal tasks require learning joint representation across modalities. In this paper, we present an approach to employ latent stochastic models for a multimodal task image captioning. Encoder Decoder models with stochastic latent variables are often faced with optimization issues such as latent collapse preventing them from realizing their full potential of rich representation learning and disentanglement. We present an approach to train such models by incorporating joint continuous and discrete representation in the prior distribution. We evaluate the performance of proposed approach on a multitude of metrics against vanilla latent stochastic models. We also perform a qualitative assessment and observe that the proposed approach indeed has the potential to learn composite information and explain novel combinations not seen in the training data. |
Databáze: | OpenAIRE |
Externí odkaz: |