Semantically Consistent Hierarchical Text to Fashion Image Synthesis with an Enhanced-Attentional Generative Adversarial Network
Autor: | Jo Yew Tham, Ashraf A. Kassim, Joo-Hwee Lim, Kenan E. Ak |
---|---|
Rok vydání: | 2019 |
Předmět: |
Normalization (statistics)
business.industry Computer science Feature extraction Stability (learning theory) Normalization (image processing) Context (language use) Pattern recognition Feature (computer vision) Noise (video) Artificial intelligence business Sentence Similarity learning Natural language |
Zdroj: | ICCV Workshops |
DOI: | 10.1109/iccvw.2019.00379 |
Popis: | In this paper, we present the enhanced Attentional Generative Adversarial Network (e-AttnGAN) with improved training stability for text-to-image synthesis. e-AttnGAN's integrated attention module utilizes both sentence and word context features and performs feature-wise linear modulation (FiLM) to fuse visual and natural language representations. In addition to multimodal similarity learning for text and image features of AttnGAN, cosine and feature matching losses of real and generated images are included while employing a classification loss for "significant attributes". In order to improve the stability of the training and solve the issue of model collapse, spectral normalization and two-time scale update for the discriminator are used together with instance noise. Our experiments show that e-AttnGAN outperforms state-of-the-art methods on the FashionGen and DeepFashion-Synthesis datasets. |
Databáze: | OpenAIRE |
Externí odkaz: |