Zobrazeno 1 - 10
of 1 171
pro vyhledávání: '"Wu, WenHao"'
Autor:
Zhang, Jiebin, Zhu, Dawei, Song, Yifan, Wu, Wenhao, Kuang, Chuqiao, Li, Xiaoguang, Shang, Lifeng, Liu, Qun, Li, Sujian
As large language models (LLMs) process increasing context windows, the memory usage of KV cache has become a critical bottleneck during inference. The mainstream KV compression methods, including KV pruning and KV quantization, primarily focus on ei
Externí odkaz:
http://arxiv.org/abs/2412.12706
Audio Descriptions (ADs) aim to provide a narration of a movie in text form, describing non-dialogue-related narratives, such as characters, actions, or scene establishment. Automatic generation of ADs remains challenging due to: i) the domain gap be
Externí odkaz:
http://arxiv.org/abs/2411.18180
A longstanding goal of artificial general intelligence is highly capable generalists that can learn from diverse experiences and generalize to unseen tasks. The language and vision communities have seen remarkable progress toward this trend by scalin
Externí odkaz:
http://arxiv.org/abs/2410.11448
Autor:
Song, Yifan, Xiong, Weimin, Zhao, Xiutian, Zhu, Dawei, Wu, Wenhao, Wang, Ke, Li, Cheng, Peng, Wei, Li, Sujian
Fine-tuning on agent-environment interaction trajectory data holds significant promise for surfacing generalized agent capabilities in open-source large language models (LLMs). In this work, we introduce AgentBank, by far the largest trajectory tunin
Externí odkaz:
http://arxiv.org/abs/2410.07706
Autor:
Xiong, Weimin, Song, Yifan, Zhao, Xiutian, Wu, Wenhao, Wang, Xun, Wang, Ke, Li, Cheng, Peng, Wei, Li, Sujian
Large language model agents have exhibited exceptional performance across a range of complex interactive tasks. Recent approaches have utilized tuning with expert trajectories to enhance agent performance, yet they primarily concentrate on outcome re
Externí odkaz:
http://arxiv.org/abs/2406.11176
Autor:
Yao, Huanjin, Wu, Wenhao, Yang, Taojiannan, Song, YuXin, Zhang, Mengxi, Feng, Haocheng, Sun, Yifan, Li, Zhiheng, Ouyang, Wanli, Wang, Jingdong
Publikováno v:
NeurIPS 2024
Do we fully leverage the potential of visual encoder in Multimodal Large Language Models (MLLMs)? The recent outstanding performance of MLLMs in multimodal understanding has garnered broad attention from both academia and industry. In the current MLL
Externí odkaz:
http://arxiv.org/abs/2405.13800
Autor:
Zhang, Mengxi, Wu, Wenhao, Lu, Yu, Song, Yuxin, Rong, Kang, Yao, Huanjin, Zhao, Jianbo, Liu, Fanglong, Sun, Yifan, Feng, Haocheng, Wang, Jingdong
Current multimodal Large Language Models (MLLMs) suffer from ``hallucination'', occasionally generating responses that are not grounded in the input images. To tackle this challenge, one promising path is to utilize reinforcement learning from human
Externí odkaz:
http://arxiv.org/abs/2405.11165
Autor:
Wu, Wenhao
This paper undertakes an empirical study to revisit the latest advancements in Multimodal Large Language Models (MLLMs): Video Assistant. This study, namely FreeVA, aims to extend existing image-based MLLM to the video domain in a training-free manne
Externí odkaz:
http://arxiv.org/abs/2405.07798
Effectively handling instructions with extremely long context remains a challenge for Large Language Models (LLMs), typically necessitating high-quality long data and substantial computational resources. This paper introduces Step-Skipping Alignment
Externí odkaz:
http://arxiv.org/abs/2405.03939
Despite the recent progress in long-context language models, it remains elusive how transformer-based models exhibit the capability to retrieve relevant information from arbitrary locations within the long context. This paper aims to address this que
Externí odkaz:
http://arxiv.org/abs/2404.15574