Zobrazeno 1 - 4
of 4
pro vyhledávání: '"Qiu, Jihao"'
Autor:
Qiu, Jihao, Zhang, Yuan, Tang, Xi, Xie, Lingxi, Ma, Tianren, Yan, Pengyu, Doermann, David, Ye, Qixiang, Tian, Yunjie
Videos carry rich visual information including object description, action, interaction, etc., but the existing multimodal large language models (MLLMs) fell short in referential understanding scenarios such as video-based referring. In this paper, we
Externí odkaz:
http://arxiv.org/abs/2406.00258
Autor:
Tian, Yunjie, Ma, Tianren, Xie, Lingxi, Qiu, Jihao, Tang, Xi, Zhang, Yuan, Jiao, Jianbin, Tian, Qi, Ye, Qixiang
In this study, we establish a baseline for a new task named multimodal multi-round referring and grounding (MRG), opening up a promising direction for instance-level multimodal dialogues. We present a new benchmark and an efficient vision-language mo
Externí odkaz:
http://arxiv.org/abs/2401.13307
We propose integrally pre-trained transformer pyramid network (iTPN), towards jointly optimizing the network backbone and the neck, so that transfer gap between representation models and downstream tasks is minimal. iTPN is born with two elaborated d
Externí odkaz:
http://arxiv.org/abs/2211.12735
Publikováno v:
IEEE Transactions on Pattern Analysis and Machine Intelligence; December 2024, Vol. 46 Issue: 12 p9766-9779, 14p