Autor: |
Zhang, Chumeng, Yang, Yue, Guo, Junbo, Jin, Guoqing, Song, Dan, Liu, An An |
Předmět: |
|
Zdroj: |
Multimedia Systems; Apr2023, Vol. 29 Issue 2, p569-575, 7p |
Abstrakt: |
Text-image retrieval task has attracted extensive attention nowadays. Due to the different feature distributions, the performance of this task suffers from the large modal discrepancy. Most retrieval methods map images and texts into a common embedding space and measure the similarities. However, in the dataset, there may be multiple texts corresponding to the same image. Previous approaches rarely consider these texts together when calculating the similarities in common space. In this paper, we propose an improving text-image cross-modal retrieval framework with contrastive loss, which considers multiple texts of one image. Using the overall text features, our approach makes better alignment between image and its corresponding text center. Results on the Flickr30K dataset achieve the competitive performance, validating the effectiveness of the proposed. [ABSTRACT FROM AUTHOR] |
Databáze: |
Complementary Index |
Externí odkaz: |
|