Dynamic Weighted Combiner for Mixed-Modal Image Retrieval

Autor:	Huang, Fuxiang, Zhang, Lei, Fu, Xiaowei, Song, Suqi
Rok vydání:	2023
Předmět:	Computer Science - Computer Vision and Pattern Recognition
Druh dokumentu:	Working Paper
Popis:	Mixed-Modal Image Retrieval (MMIR) as a flexible search paradigm has attracted wide attention. However, previous approaches always achieve limited performance, due to two critical factors are seriously overlooked. 1) The contribution of image and text modalities is different, but incorrectly treated equally. 2) There exist inherent labeling noises in describing users' intentions with text in web datasets from diverse real-world scenarios, giving rise to overfitting. We propose a Dynamic Weighted Combiner (DWC) to tackle the above challenges, which includes three merits. First, we propose an Editable Modality De-equalizer (EMD) by taking into account the contribution disparity between modalities, containing two modality feature editors and an adaptive weighted combiner. Second, to alleviate labeling noises and data bias, we propose a dynamic soft-similarity label generator (SSG) to implicitly improve noisy supervision. Finally, to bridge modality gaps and facilitate similarity learning, we propose a CLIP-based mutual enhancement module alternately trained by a mixed-modality contrastive loss. Extensive experiments verify that our proposed model significantly outperforms state-of-the-art methods on real-world datasets. The source code is available at \url{https://github.com/fuxianghuang1/DWC}. Comment: 11 pages, 12 figures and 12 tables. To appear in AAAI 2024
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2312.06179 Zobrazit plný text záznamu View this record from Arxiv