Infrared and Visible Image Fusion via General Feature Embedding From CLIP and DINOv2

Autor:	Yichuang Luo, Fang Wang, Xiaohu Liu
Jazyk:	angličtina
Rok vydání:	2024
Předmět:	CLIP DINOv2 feature alignment image fusion multi-modal fusion semantic segmentation Electrical engineering. Electronics. Nuclear engineering TK1-9971
Zdroj:	IEEE Access, Vol 12, Pp 99362-99371 (2024)
Druh dokumentu:	article
ISSN:	2169-3536
DOI:	10.1109/ACCESS.2024.3428407
Popis:	Jointing multi-modal image fusion and subsequent high-level tasks is attracting more researches to achieve both mutual promotions. However, owing the feature gap between the two tasks, complicated network structure and training strategies need to be redesigned for specific different datasets. To address these issues, this paper proposes an infrared and visible image fusion via general feature embedding from frozen CLIP and DINOv2 models. The core idea is that the general semantic features from CLIP model are injected into the fusion network with the DINOv2-based segmenter as a constraint. Specially, the feature merging module and injection strategies are design to generate the semantic features that are compatible with fusion features meanwhile aligned with DINOv2 features. Leveraging the generalization ability of these foundation models, the proposed network can be optimized mutually to promote the training process. Comprehensive experiments on the four public datasets demonstrate the effectiveness of our method.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/2d150b5a48a94711baf28d09b3179e7c Zobrazit plný text záznamu View record in DOAJ