BagFormer: Better Cross-Modal Retrieval via bag-wise interaction

Autor:	Hou, Haowen, Yan, Xiaopeng, Zhang, Yigeng, Lian, Fengzong, Kang, Zhanhui
Rok vydání:	2022
Předmět:	Computer Science - Information Retrieval Computer Science - Artificial Intelligence Computer Science - Multimedia
Druh dokumentu:	Working Paper
Popis:	In the field of cross-modal retrieval, single encoder models tend to perform better than dual encoder models, but they suffer from high latency and low throughput. In this paper, we present a dual encoder model called BagFormer that utilizes a cross modal interaction mechanism to improve recall performance without sacrificing latency and throughput. BagFormer achieves this through the use of bag-wise interactions, which allow for the transformation of text to a more appropriate granularity and the incorporation of entity knowledge into the model. Our experiments demonstrate that BagFormer is able to achieve results comparable to state-of-the-art single encoder models in cross-modal retrieval tasks, while also offering efficient training and inference with 20.72 times lower latency and 25.74 times higher throughput. Comment: 8 pages, 4 figures, 4 tables
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2212.14322 Zobrazit plný text záznamu View this record from Arxiv