Zobrazeno 1 - 4
of 4
pro vyhledávání: '"Ma, Tianren"'
Autor:
Ma, Tianren, Xie, Lingxi, Tian, Yunjie, Yang, Boyu, Zhang, Yuan, Doermann, David, Ye, Qixiang
An essential topic for multimodal large language models (MLLMs) is aligning vision and language concepts at a finer level. In particular, we devote efforts to encoding visual referential information for tasks such as referring and grounding. Existing
Externí odkaz:
http://arxiv.org/abs/2406.11327
Autor:
Qiu, Jihao, Zhang, Yuan, Tang, Xi, Xie, Lingxi, Ma, Tianren, Yan, Pengyu, Doermann, David, Ye, Qixiang, Tian, Yunjie
Videos carry rich visual information including object description, action, interaction, etc., but the existing multimodal large language models (MLLMs) fell short in referential understanding scenarios such as video-based referring. In this paper, we
Externí odkaz:
http://arxiv.org/abs/2406.00258
Autor:
Tian, Yunjie, Ma, Tianren, Xie, Lingxi, Qiu, Jihao, Tang, Xi, Zhang, Yuan, Jiao, Jianbin, Tian, Qi, Ye, Qixiang
In this study, we establish a baseline for a new task named multimodal multi-round referring and grounding (MRG), opening up a promising direction for instance-level multimodal dialogues. We present a new benchmark and an efficient vision-language mo
Externí odkaz:
http://arxiv.org/abs/2401.13307
Autor:
Ma, Tianren, Xia, Zhengyou
Publikováno v:
Modern Physics Letters B; May2017, Vol. 31 Issue 14, p-1, 18p