Zobrazeno 1 - 10
of 30
pro vyhledávání: '"Rao, Fengyun"'
Autor:
Wu, Yongliang, Hu, Xinting, Sun, Yuyang, Zhou, Yizhou, Zhu, Wenbo, Rao, Fengyun, Schiele, Bernt, Yang, Xu
Video Large Language Models (Vid-LLMs) have made remarkable advancements in comprehending video content for QA dialogue. However, they struggle to extend this visual understanding to tasks requiring precise temporal localization, known as Video Tempo
Externí odkaz:
http://arxiv.org/abs/2411.10332
Recent advancements in multi-modal large language models have propelled the development of joint probabilistic models capable of both image understanding and generation. However, we have identified that recent methods inevitably suffer from loss of i
Externí odkaz:
http://arxiv.org/abs/2410.10798
Autor:
Yue, Xinli, Sun, Jianhui, Kong, Han, Yao, Liangchao, Wang, Tianyi, Li, Lei, Rao, Fengyun, Lv, Jing, Xia, Fan, Deng, Yuetang, Wang, Qian, Zhao, Lingchen
In recent years, AI generative models have made remarkable progress across various domains, including text generation, image generation, and video generation. However, assessing the quality of text-to-video generation is still in its infancy, and exi
Externí odkaz:
http://arxiv.org/abs/2409.14888
Autor:
Yue, Xinli, Sun, Jianhui, Yao, Liangchao, Xia, Fan, Deng, Yuetang, Wang, Tianyi, Li, Lei, Rao, Fengyun, Lv, Jing, Wang, Qian, Zhao, Lingchen
The increasing popularity of short video platforms such as YouTube Shorts, TikTok, and Kwai has led to a surge in User-Generated Content (UGC), which presents significant challenges for the generalization performance of Video Quality Assessment (VQA)
Externí odkaz:
http://arxiv.org/abs/2409.14847
Autor:
Ma, Feipeng, Zhou, Yizhou, Li, Hebei, He, Zilong, Wu, Siying, Rao, Fengyun, Zhang, Yueyi, Sun, Xiaoyan
In the realm of multimodal research, numerous studies leverage substantial image-text pairs to conduct modal alignment learning, transforming Large Language Models (LLMs) into Multimodal LLMs and excelling in a variety of visual-language tasks. The p
Externí odkaz:
http://arxiv.org/abs/2408.11795
Autor:
Ma, Feipeng, Xue, Hongwei, Wang, Guangting, Zhou, Yizhou, Rao, Fengyun, Yan, Shilin, Zhang, Yueyi, Wu, Siying, Shou, Mike Zheng, Sun, Xiaoyan
Existing Multimodal Large Language Models (MLLMs) follow the paradigm that perceives visual information by aligning visual features with the input space of Large Language Models (LLMs), and concatenating visual tokens with text tokens to form a unifi
Externí odkaz:
http://arxiv.org/abs/2405.20339
Autor:
Ma, Feipeng, Xue, Hongwei, Wang, Guangting, Zhou, Yizhou, Rao, Fengyun, Yan, Shilin, Zhang, Yueyi, Wu, Siying, Shou, Mike Zheng, Sun, Xiaoyan
Most multi-modal tasks can be formulated into problems of either generation or embedding. Existing models usually tackle these two types of problems by decoupling language modules into a text decoder for generation, and a text encoder for embedding.
Externí odkaz:
http://arxiv.org/abs/2405.19333
Autor:
Xu, Liang, Zhou, Yizhou, Yan, Yichao, Jin, Xin, Zhu, Wenhan, Rao, Fengyun, Yang, Xiaokang, Zeng, Wenjun
Humans constantly interact with their surrounding environments. Current human-centric generative models mainly focus on synthesizing humans plausibly interacting with static scenes and objects, while the dynamic human action-reaction synthesis for ub
Externí odkaz:
http://arxiv.org/abs/2403.11882
Autor:
Xu, Liang, Lv, Xintao, Yan, Yichao, Jin, Xin, Wu, Shuwen, Xu, Congsheng, Liu, Yifan, Zhou, Yizhou, Rao, Fengyun, Sheng, Xingdong, Liu, Yunhui, Zeng, Wenjun, Yang, Xiaokang
The analysis of the ubiquitous human-human interactions is pivotal for understanding humans as social beings. Existing human-human interaction datasets typically suffer from inaccurate body motions, lack of hand gestures and fine-grained textual desc
Externí odkaz:
http://arxiv.org/abs/2312.16051
Image captioning requires numerous annotated image-text pairs, resulting in substantial annotation costs. Recently, large models (e.g. diffusion models and large language models) have excelled in producing high-quality images and text. This potential
Externí odkaz:
http://arxiv.org/abs/2305.18072