Zobrazeno 1 - 10
of 348
pro vyhledávání: '"Liu Yuanxin"'
Autor:
Li, Lei, Liu, Yuanxin, Yao, Linli, Zhang, Peiyuan, An, Chenxin, Wang, Lean, Sun, Xu, Kong, Lingpeng, Liu, Qi
Video Large Language Models (Video LLMs) have shown promising capabilities in video comprehension, yet they struggle with tracking temporal changes and reasoning about temporal relationships. While previous research attributed this limitation to the
Externí odkaz:
http://arxiv.org/abs/2410.06166
The visual projector, which bridges the vision and language modalities and facilitates cross-modal alignment, serves as a crucial component in MLLMs. However, measuring the effectiveness of projectors in vision-language alignment remains under-explor
Externí odkaz:
http://arxiv.org/abs/2405.20985
Autor:
Chen, Sishuo, Li, Lei, Ren, Shuhuai, Gao, Rundong, Liu, Yuanxin, Bi, Xiaohan, Sun, Xu, Hou, Lu
Video paragraph captioning (VPC) involves generating detailed narratives for long videos, utilizing supportive modalities such as speech and event boundaries. However, the existing models are constrained by the assumption of constant availability of
Externí odkaz:
http://arxiv.org/abs/2403.19221
Autor:
Liu, Yuanxin, Li, Shicheng, Liu, Yi, Wang, Yuxiang, Ren, Shuhuai, Li, Lei, Chen, Sishuo, Sun, Xu, Hou, Lu
Recently, there is a surge in interest surrounding video large language models (Video LLMs). However, existing benchmarks fail to provide a comprehensive feedback on the temporal perception ability of Video LLMs. On the one hand, most of them are una
Externí odkaz:
http://arxiv.org/abs/2403.00476
The ability to perceive how objects change over time is a crucial ingredient in human intelligence. However, current benchmarks cannot faithfully reflect the temporal understanding abilities of video-language models (VidLMs) due to the existence of s
Externí odkaz:
http://arxiv.org/abs/2311.17404
Autor:
Liu, Yuanxin, Li, Lei, Ren, Shuhuai, Gao, Rundong, Li, Shicheng, Chen, Sishuo, Sun, Xu, Hou, Lu
Recently, open-domain text-to-video (T2V) generation models have made remarkable progress. However, the promising results are mainly shown by the qualitative cases of generated videos, while the quantitative evaluation of T2V models still faces two c
Externí odkaz:
http://arxiv.org/abs/2311.01813
Autor:
Li, Xintong1 (AUTHOR), Liu, Yuanxin1 (AUTHOR), Gui, Jun2 (AUTHOR), Gan, Lu3 (AUTHOR) ganlu@wchscu.cn, Xue, Jianxin4 (AUTHOR) jianxin-xue@wchscu.edu.cn
Publikováno v:
Advanced Science. 11/6/2024, Vol. 11 Issue 41, p1-23. 23p.
Autor:
Liu, Yuanxin1 (AUTHOR), Yang, Xue1 (AUTHOR), Wang, Yan1 (AUTHOR), Zhou, Laiyan1 (AUTHOR), Xue, Jianxin1,2,3 (AUTHOR) jianxin-xue@wchscu.edu.cn
Publikováno v:
Oncologie (De Gruyter). Nov2024, Vol. 26 Issue 6, p1003-1017. 15p.
Transformer-based pre-trained language models (PLMs) mostly suffer from excessive overhead despite their advanced capacity. For resource-constrained devices, there is an urgent need for a spatially and temporally efficient model which retains the maj
Externí odkaz:
http://arxiv.org/abs/2210.15523
Despite the excellent performance of vision-language pre-trained models (VLPs) on conventional VQA task, they still suffer from two problems: First, VLPs tend to rely on language biases in datasets and fail to generalize to out-of-distribution (OOD)
Externí odkaz:
http://arxiv.org/abs/2210.14558