Weakly Supervised Hashing with Reconstructive Cross-modal Attention.

Autor: YONGCHAO DU, MIN WANG, ZHENBO LU, WENGANG ZHOU, HOUQIANG LI
Zdroj: ACM Transactions on Multimedia Computing, Communications & Applications; Nov2023, Vol. 19 Issue 6, p1-19, 19p
Abstrakt: On many popular social websites, images are usually associated with some meta-data such as textual tags, which involve semantic information relevant to the image and can be used to supervise the representation learning for image retrieval. However, these user-provided tags are usually polluted by noise, therefore the main challenge lies in mining the potential useful information from those noisy tags. Many previous works simply treat different tags equally to generate supervision, which will inevitably distract the network learning. To this end, we propose a new framework, termed as Weakly Supervised Hashing with Reconstructive Cross-modal Attention (WSHRCA), to learn compact visual-semantic representation with more reliable supervision for retrieval task. Specifically, for each image-tag pair, the weak supervision from tags is refined by cross-modal attention, which takes image feature as query to aggregate the most content-relevant tags. Therefore, tags with relevant content will be more prominent while noisy tags will be suppressed, which provides more accurate supervisory information. To improve the effectiveness of hash learning, the image embedding in WSHRCA is reconstructed from hash code, which is further optimized by cross-modal constraint and explicitly improves hash learning. The experiments on two widely-used datasets demonstrate the effectiveness of our proposed method for weakly-supervised image retrieval. The code is available at https://github.com/duyc168/weakly-supervised-hashing. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index