TolerantGAN: Text-Guided Image Manipulation Tolerant to Real-World Image

Autor: Yuto Watanabe, Ren Togo, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama
Jazyk: angličtina
Rok vydání: 2024
Předmět:
Zdroj: IEEE Open Journal of Signal Processing, Vol 5, Pp 150-159 (2024)
Druh dokumentu: article
ISSN: 2644-1322
DOI: 10.1109/OJSP.2023.3343335
Popis: Although text-guided image manipulation approaches have demonstrated highly accurate performance for editing the appearance of images in a virtual or simple scenario, their real-world applications face significant challenges. The primary cause of these challenges is the misalignment in the distribution of training and real-world data, which leads to unstable text-guided image manipulation. In this work, we propose a novel framework called TolerantGAN and tackle the new task of real-world text-guided image manipulation independent of the training data. To achieve this, we introduce two key concepts of a border smoothly connection module (BSCM) and a manipulation direction-based attention module (MDAM). BSCM smoothens the misalignment in the distribution of training and real-world data. MDAM extracts only regions highly relevant for image manipulation and assists in reconstructing unobserved objects in the training data. For in-the-wild input images of various classes, TolerantGAN robustly outperforms the state-of-the-art methods.
Databáze: Directory of Open Access Journals