Zobrazeno 1 - 10
of 141
pro vyhledávání: '"Yu, QiFan"'
Autor:
Yu, Qifan, Shen, Zhebei, Yue, Zhongqi, Wu, Yang, Zhang, Wenqiao, Li, Yunfei, Li, Juncheng, Tang, Siliang, Zhuang, Yueting
Instruction tuning fine-tunes pre-trained Multi-modal Large Language Models (MLLMs) to handle real-world tasks. However, the rapid expansion of visual instruction datasets introduces data redundancy, leading to excessive computational costs. We propo
Externí odkaz:
http://arxiv.org/abs/2412.06293
Autor:
Qiu, Haiyi, Gao, Minghe, Qian, Long, Pan, Kaihang, Yu, Qifan, Li, Juncheng, Wang, Wenjie, Tang, Siliang, Zhuang, Yueting, Chua, Tat-Seng
Video Large Language Models (Video-LLMs) have recently shown strong performance in basic video understanding tasks, such as captioning and coarse-grained question answering, but struggle with compositional reasoning that requires multi-step spatio-te
Externí odkaz:
http://arxiv.org/abs/2412.00161
Autor:
Yu, Qifan, Chow, Wei, Yue, Zhongqi, Pan, Kaihang, Wu, Yang, Wan, Xiaoyang, Li, Juncheng, Tang, Siliang, Zhang, Hanwang, Zhuang, Yueting
Instruction-based image editing aims to modify specific image elements with natural language instructions. However, current models in this domain often struggle to accurately execute complex user instructions, as they are trained on low-quality data
Externí odkaz:
http://arxiv.org/abs/2411.15738
Autor:
Chow, Wei, Li, Juncheng, Yu, Qifan, Pan, Kaihang, Fei, Hao, Ge, Zhiqi, Yang, Shuai, Tang, Siliang, Zhang, Hanwang, Sun, Qianru
In recent times, Vision-Language Models (VLMs) have been trained under two predominant paradigms. Generative training has enabled Multimodal Large Language Models (MLLMs) to tackle various complex tasks, yet issues such as hallucinations and weak obj
Externí odkaz:
http://arxiv.org/abs/2411.00304
Autor:
Pan, Kaihang, Fan, Zhaoyu, Li, Juncheng, Yu, Qifan, Fei, Hao, Tang, Siliang, Hong, Richang, Zhang, Hanwang, Sun, Qianru
The swift advancement in Multimodal LLMs (MLLMs) also presents significant challenges for effective knowledge editing. Current methods, including intrinsic knowledge editing and external knowledge resorting, each possess strengths and weaknesses, str
Externí odkaz:
http://arxiv.org/abs/2409.19872
The field of computer-aided synthesis planning (CASP) has seen rapid advancements in recent years, achieving significant progress across various algorithmic benchmarks. However, chemists often encounter numerous infeasible reactions when using CASP i
Externí odkaz:
http://arxiv.org/abs/2409.04335
Autor:
Yu, Qifan, Li, Juncheng, Wei, Longhui, Pang, Liang, Ye, Wentao, Qin, Bosheng, Tang, Siliang, Tian, Qi, Zhuang, Yueting
Multi-modal Large Language Models (MLLMs) tuned on machine-generated instruction-following data have demonstrated remarkable performance in various multi-modal understanding and generation tasks. However, the hallucinations inherent in machine-genera
Externí odkaz:
http://arxiv.org/abs/2311.13614
The rising demand for creating lifelike avatars in the digital realm has led to an increased need for generating high-quality human videos guided by textual descriptions and poses. We propose Dancing Avatar, designed to fabricate human motion videos
Externí odkaz:
http://arxiv.org/abs/2308.07749
Recent text-to-image generation models have shown promising results in generating high-fidelity photo-realistic images. In parallel, the problem of data scarcity has brought a growing interest in employing AIGC technology for high-quality data expans
Externí odkaz:
http://arxiv.org/abs/2305.12799
Scene Graph Generation (SGG) aims to extract relationships in images for vision understanding. Although recent works have made steady progress on SGG, they still suffer long-tail distribution issues that tail-predicates a
Externí odkaz:
http://arxiv.org/abs/2303.13233