Zobrazeno 1 - 10
of 82
pro vyhledávání: '"Luo, Fuwen"'
Autor:
Lin, Junming, Fang, Zheng, Chen, Chi, Wan, Zihao, Luo, Fuwen, Li, Peng, Liu, Yang, Sun, Maosong
The rapid development of Multimodal Large Language Models (MLLMs) has expanded their capabilities from image comprehension to video understanding. However, most of these MLLMs focus primarily on offline video comprehension, necessitating extensive pr
Externí odkaz:
http://arxiv.org/abs/2411.03628
Autor:
Wang, Ziyue, Chen, Chi, Luo, Fuwen, Dong, Yurui, Zhang, Yuanchi, Xu, Yuzhuang, Wang, Xiaolong, Li, Peng, Liu, Yang
Active perception, a crucial human capability, involves setting a goal based on the current understanding of the environment and performing actions to achieve that goal. Despite significant efforts in evaluating Multimodal Large Language Models (MLLM
Externí odkaz:
http://arxiv.org/abs/2410.04659
Large Language Models (LLMs) have achieved remarkable performance in objective tasks such as open-domain question answering and mathematical reasoning, which can often be solved through recalling learned factual knowledge or chain-of-thought style re
Externí odkaz:
http://arxiv.org/abs/2402.17226
Autor:
Luo, Fuwen, Chen, Chi, Wan, Zihao, Kang, Zhaolu, Yan, Qidong, Li, Yingjie, Wang, Xiaolong, Wang, Siyu, Wang, Ziyue, Mi, Xiaoyue, Li, Peng, Ma, Ning, Sun, Maosong, Liu, Yang
Multimodal large language models (MLLMs) have demonstrated promising results in a variety of tasks that combine vision and language. As these models become more integral to research and applications, conducting comprehensive evaluations of their capa
Externí odkaz:
http://arxiv.org/abs/2402.13607
Autor:
Chen, Chi, Du, Yiyang, Fang, Zheng, Wang, Ziyue, Luo, Fuwen, Li, Peng, Yan, Ming, Zhang, Ji, Huang, Fei, Sun, Maosong, Liu, Yang
Recent developments in Multimodal Large Language Models (MLLMs) have shown rapid progress, moving towards the goal of creating versatile MLLMs that understand inputs from various modalities. However, existing methods typically rely on joint training
Externí odkaz:
http://arxiv.org/abs/2402.12750
Autor:
Wang, Ziyue, Chen, Chi, Zhu, Yiqi, Luo, Fuwen, Li, Peng, Yan, Ming, Zhang, Ji, Huang, Fei, Sun, Maosong, Liu, Yang
With the bloom of Large Language Models (LLMs), Multimodal Large Language Models (MLLMs) that incorporate LLMs with pre-trained vision models have recently demonstrated impressive performance across diverse vision-language tasks. However, they fall s
Externí odkaz:
http://arxiv.org/abs/2402.12195
Autor:
Yang, Zonghan, Liu, An, Liu, Zijun, Liu, Kaiming, Xiong, Fangzhou, Wang, Yile, Yang, Zeyuan, Hu, Qingyuan, Chen, Xinrui, Zhang, Zhenhe, Luo, Fuwen, Guo, Zhicheng, Li, Peng, Liu, Yang
The rapid progress of foundation models has led to the prosperity of autonomous agents, which leverage the universal capabilities of foundation models to conduct reasoning, decision-making, and environmental interaction. However, the efficacy of agen
Externí odkaz:
http://arxiv.org/abs/2402.07744
Communication games, which we refer to as incomplete information games that heavily depend on natural language communication, hold significant research value in fields such as economics, social science, and artificial intelligence. In this work, we e
Externí odkaz:
http://arxiv.org/abs/2309.04658
Recently, Multimodal Large Language Models (MLLMs) that enable Large Language Models (LLMs) to interpret images through visual instruction tuning have achieved significant success. However, existing visual instruction tuning methods only utilize imag
Externí odkaz:
http://arxiv.org/abs/2308.13437
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.