Zobrazeno 1 - 2
of 2
pro vyhledávání: '"Xie, Jingyou"'
Autor:
Kuang, Jiayi, Xie, Jingyou, Luo, Haohao, Li, Ronghao, Xu, Zhe, Cheng, Xianfeng, Li, Yinghui, Lin, Xika, Shen, Ying
Visual Question Answering (VQA) is a challenge task that combines natural language processing and computer vision techniques and gradually becomes a benchmark test task in multimodal large language models (MLLMs). The goal of our survey is to provide
Externí odkaz:
http://arxiv.org/abs/2411.17558
Given a query from one modality, few-shot cross-modal retrieval (CMR) retrieves semantically similar instances in another modality with the target domain including classes that are disjoint from the source domain. Compared with classical few-shot CMR
Externí odkaz:
http://arxiv.org/abs/2411.17454