Local part attention for image stylization with text prompt.

Autor: Truong, Quoc-Truong, Nguyen, Vinh-Tiep, Nguyen, Lan-Phuong, Cao, Hung-Phu, Luu, Duc-Tuan
Zdroj: Neural Computing & Applications; Dec2024, Vol. 36 Issue 34, p21859-21871, 13p
Abstrakt: Prompt-based portrait image style transfer aims at translating an input content image to a desired style described by text without a style image. In many practical situations, users may not only attend to the entire portrait image but also the local parts (e.g., eyes, lips, and hair). To address such applications, we propose a new framework that enables style transfer on specific regions described by a text description of the desired style. Specifically, we incorporate semantic segmentation to identify the intended area without requiring edit masks from the user while utilizing a pre-trained CLIP-based model for stylizing. Besides, we propose a text-to-patch matching loss by randomly dividing the stylized image into smaller patches to ensure the consistent quality of the result. To comprehensively evaluate the proposed method, we use several metrics, such as FID, SSIM, and PSNR on a dataset consisting of portraits from the CelebAMask-HQ dataset and style descriptions of other related works. Extensive experimental results demonstrate that our framework outperforms other state-of-the-art methods in terms of both stylization quality and inference time. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index