Learning Disentangled Representation in Latent Stochastic Models: A Case Study with Image Captioning

Autor: Lalitesh Morishetti, Eduard Hovy, Sai Krishna Rallabandi, Alan W. Black, Nidhi Vyas
Rok vydání: 2019
Předmět:
Zdroj: ICASSP
DOI: 10.1109/icassp.2019.8683370
Popis: Multimodal tasks require learning joint representation across modalities. In this paper, we present an approach to employ latent stochastic models for a multimodal task image captioning. Encoder Decoder models with stochastic latent variables are often faced with optimization issues such as latent collapse preventing them from realizing their full potential of rich representation learning and disentanglement. We present an approach to train such models by incorporating joint continuous and discrete representation in the prior distribution. We evaluate the performance of proposed approach on a multitude of metrics against vanilla latent stochastic models. We also perform a qualitative assessment and observe that the proposed approach indeed has the potential to learn composite information and explain novel combinations not seen in the training data.
Databáze: OpenAIRE