Efficient Neural Architecture for Text-to-Image Synthesis
Autor: | Duncan D. Ruiz, Jonatas Wehrmann, Douglas M. Souza |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
FOS: Computer and information sciences
Computer Science - Machine Learning Computer science business.industry Machine Learning (stat.ML) 02 engineering and technology 010501 environmental sciences Machine learning computer.software_genre 01 natural sciences Task (project management) Machine Learning (cs.LG) Multimodal learning Statistics - Machine Learning 0202 electrical engineering electronic engineering information engineering Task analysis 020201 artificial intelligence & image processing Artificial intelligence Architecture business computer Sentence 0105 earth and related environmental sciences Generator (mathematics) Interpolation |
Zdroj: | IJCNN |
Popis: | Text-to-image synthesis is the task of generating images from text descriptions. Image generation, by itself, is a challenging task. When we combine image generation and text, we bring complexity to a new level: we need to combine data from two different modalities. Most of recent works in text-to-image synthesis follow a similar approach when it comes to neural architectures. Due to aforementioned difficulties, plus the inherent difficulty of training GANs at high resolutions, most methods have adopted a multi-stage training strategy. In this paper we shift the architectural paradigm currently used in text-to-image methods and show that an effective neural architecture can achieve state-of-the-art performance using a single stage training with a single generator and a single discriminator. We do so by applying deep residual networks along with a novel sentence interpolation strategy that enables learning a smooth conditional space. Finally, our work points a new direction for text-to-image research, which has not experimented with novel neural architectures recently. |
Databáze: | OpenAIRE |
Externí odkaz: |