GrainedCLIP and DiffusionGrainedCLIP: Text-Guided Advanced Models for Fine-Grained Attribute Face Image Processing

Autor:	Jincheng Zhu, Liwei Mu
Jazyk:	angličtina
Rok vydání:	2023
Předmět:	Visual-language pre-training models fine-grained facial attributes text-guided face image processing Electrical engineering. Electronics. Nuclear engineering TK1-9971
Zdroj:	IEEE Access, Vol 11, Pp 99030-99045 (2023)
Druh dokumentu:	article
ISSN:	2169-3536
DOI:	10.1109/ACCESS.2023.3313248
Popis:	Text-guided image processing has made tremendous progress in recent years. Most existing methods generally focus on using visual-language pre-training models for text-guided image processing. However, their applications to achieve text-guided fine-grained attribute face image processing (e.g., editing a smiling face to change from showing teeth to a closed-mouth smile) lead to poor performance due to the limited fine-grained semantic knowledge learned by existing visual-language pre-training models. To alleviate this problem, we propose a novel visual-language pre-training model based on fine-grained facial attribute features, which we call GrainedCLIP. Based on GrainedCLIP, we further propose a new text-guided fine-grained attribute face image processing model, which we call DiffusionGrainedCLIP. Our experimental results showed that GrainedCLIP outperformed existing methods, achieving $12.61 R$ @1 and $12.17 R$ @1 in text-to-image and image-to-text retrieval evaluation metrics, respectively, on the FFHQ dataset. Furthermore, compared to state-of-the-art text-guided face image processing methods, DiffusionGrainedCLIP significantly improved 55.37% in semantic consistency and 49.38% in face identity preservation on the FFHQ dataset.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/58fa7cf73a9949078a581deead1ca102 Zobrazit plný text záznamu View record in DOAJ