Semantically Consistent Hierarchical Text to Fashion Image Synthesis with an Enhanced-Attentional Generative Adversarial Network

Autor: Jo Yew Tham, Ashraf A. Kassim, Joo-Hwee Lim, Kenan E. Ak
Rok vydání: 2019
Předmět:
Zdroj: ICCV Workshops
DOI: 10.1109/iccvw.2019.00379
Popis: In this paper, we present the enhanced Attentional Generative Adversarial Network (e-AttnGAN) with improved training stability for text-to-image synthesis. e-AttnGAN's integrated attention module utilizes both sentence and word context features and performs feature-wise linear modulation (FiLM) to fuse visual and natural language representations. In addition to multimodal similarity learning for text and image features of AttnGAN, cosine and feature matching losses of real and generated images are included while employing a classification loss for "significant attributes". In order to improve the stability of the training and solve the issue of model collapse, spectral normalization and two-time scale update for the discriminator are used together with instance noise. Our experiments show that e-AttnGAN outperforms state-of-the-art methods on the FashionGen and DeepFashion-Synthesis datasets.
Databáze: OpenAIRE