Boosting Cross-Modal Retrieval With MVSE++ and Reciprocal Neighbors

Autor:	Wei Wei, Mengmeng Jiang, Xiangnan Zhang, Heng Liu, Chunna Tian
Jazyk:	angličtina
Rok vydání:	2020
Předmět:	Cross-modal retrieval visual-semantic embedding scene context reciprocal neighbors re-ranking method Electrical engineering. Electronics. Nuclear engineering TK1-9971
Zdroj:	IEEE Access, Vol 8, Pp 84642-84651 (2020)
Druh dokumentu:	article
ISSN:	2169-3536
DOI:	10.1109/ACCESS.2020.2992187
Popis:	In this paper, we propose to boost the cross-modal retrieval through mutually aligning images and captions on the aspects of both features and relationships. First, we propose a multi-feature based visual-semantic embedding (MVSE++) space to retrieve the candidates in another modality, which provides a more comprehensive representation of the visual content of objects and scene context in images. Thus, we have more potential to find a more accurate and detailed caption for the image. However, captioning concentrates the image contents by semantic description. The cross-modal neighboring relationships start from the visual and semantic sides are asymmetric. To retrieve a better cross-modal neighbor, we propose to re-rank the initially retrieved candidates according to the ${k}$ nearest reciprocal neighbors in MVSE++ space. The method is evaluated on the benchmark datasets of MSCOCO and Flickr30K with standard metrics. We achieve highe accuracy in caption retrieval and image retrieval at both R@1 and R@10.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/45c09b993eb24dc9bf5651f6ce1579c9 Zobrazit plný text záznamu View record in DOAJ