Zobrazeno 1 - 10
of 385
pro vyhledávání: '"Liu Yuanxin"'
Autor:
Ouyang, Kun, Liu, Yuanxin, Li, Shicheng, Liu, Yi, Zhou, Hao, Meng, Fandong, Zhou, Jie, Sun, Xu
Multimodal punchlines, which involve humor or sarcasm conveyed in image-caption pairs, are a popular way of communication on online multimedia platforms. With the rapid development of multimodal large language models (MLLMs), it is essential to asses
Externí odkaz:
http://arxiv.org/abs/2412.11906
Autor:
Li, Lei, Liu, Yuanxin, Yao, Linli, Zhang, Peiyuan, An, Chenxin, Wang, Lean, Sun, Xu, Kong, Lingpeng, Liu, Qi
Video Large Language Models (Video LLMs) have shown promising capabilities in video comprehension, yet they struggle with tracking temporal changes and reasoning about temporal relationships. While previous research attributed this limitation to the
Externí odkaz:
http://arxiv.org/abs/2410.06166
The visual projector, which bridges the vision and language modalities and facilitates cross-modal alignment, serves as a crucial component in MLLMs. However, measuring the effectiveness of projectors in vision-language alignment remains under-explor
Externí odkaz:
http://arxiv.org/abs/2405.20985
Autor:
Chen, Sishuo, Li, Lei, Ren, Shuhuai, Gao, Rundong, Liu, Yuanxin, Bi, Xiaohan, Sun, Xu, Hou, Lu
Video paragraph captioning (VPC) involves generating detailed narratives for long videos, utilizing supportive modalities such as speech and event boundaries. However, the existing models are constrained by the assumption of constant availability of
Externí odkaz:
http://arxiv.org/abs/2403.19221
Autor:
Liu, Yuanxin, Li, Shicheng, Liu, Yi, Wang, Yuxiang, Ren, Shuhuai, Li, Lei, Chen, Sishuo, Sun, Xu, Hou, Lu
Recently, there is a surge in interest surrounding video large language models (Video LLMs). However, existing benchmarks fail to provide a comprehensive feedback on the temporal perception ability of Video LLMs. On the one hand, most of them are una
Externí odkaz:
http://arxiv.org/abs/2403.00476
The ability to perceive how objects change over time is a crucial ingredient in human intelligence. However, current benchmarks cannot faithfully reflect the temporal understanding abilities of video-language models (VidLMs) due to the existence of s
Externí odkaz:
http://arxiv.org/abs/2311.17404
Autor:
Liu, Yuanxin, Li, Lei, Ren, Shuhuai, Gao, Rundong, Li, Shicheng, Chen, Sishuo, Sun, Xu, Hou, Lu
Recently, open-domain text-to-video (T2V) generation models have made remarkable progress. However, the promising results are mainly shown by the qualitative cases of generated videos, while the quantitative evaluation of T2V models still faces two c
Externí odkaz:
http://arxiv.org/abs/2311.01813
Autor:
Li, Xintong1 (AUTHOR), Liu, Yuanxin1 (AUTHOR), Gui, Jun2 (AUTHOR), Gan, Lu3 (AUTHOR) ganlu@wchscu.cn, Xue, Jianxin4 (AUTHOR) jianxin-xue@wchscu.edu.cn
Publikováno v:
Advanced Science. 11/6/2024, Vol. 11 Issue 41, p1-23. 23p.
Autor:
Liu, Yuanxin1 (AUTHOR), Yang, Xue1 (AUTHOR), Wang, Yan1 (AUTHOR), Zhou, Laiyan1 (AUTHOR), Xue, Jianxin1,2,3 (AUTHOR) jianxin-xue@wchscu.edu.cn
Publikováno v:
Oncologie (De Gruyter). Nov2024, Vol. 26 Issue 6, p1003-1017. 15p.
Transformer-based pre-trained language models (PLMs) mostly suffer from excessive overhead despite their advanced capacity. For resource-constrained devices, there is an urgent need for a spatially and temporally efficient model which retains the maj
Externí odkaz:
http://arxiv.org/abs/2210.15523