Zobrazeno 1 - 10
of 333
pro vyhledávání: '"XU Zhiyang"'
Existing information retrieval (IR) models often assume a homogeneous structure for knowledge sources and user queries, limiting their applicability in real-world settings where retrieval is inherently heterogeneous and diverse. In this paper, we int
Externí odkaz:
http://arxiv.org/abs/2410.20163
Autor:
Qi, Jingyuan, Xu, Zhiyang, Shao, Rulin, Chen, Yang, Di, Jin, Cheng, Yu, Wang, Qifan, Huang, Lifu
Current vision-language models (VLMs) still exhibit inferior performance on knowledge-intensive tasks, primarily due to the challenge of accurately encoding all the associations between visual objects and scenes to their corresponding entities and ba
Externí odkaz:
http://arxiv.org/abs/2410.08876
Integrating the 3D world into large language models (3D-based LLMs) has been a promising research direction for 3D scene understanding. However, current 3D-based LLMs fall short in situated understanding due to two key limitations: 1) existing 3D dat
Externí odkaz:
http://arxiv.org/abs/2410.03878
Autor:
Wang, Haibo, Xu, Zhiyang, Cheng, Yu, Diao, Shizhe, Zhou, Yufan, Cao, Yixin, Wang, Qifan, Ge, Weifeng, Huang, Lifu
Video Large Language Models (Video-LLMs) have demonstrated remarkable capabilities in coarse-grained video understanding, however, they struggle with fine-grained temporal grounding. In this paper, we introduce Grounded-VideoLLM, a novel Video-LLM ad
Externí odkaz:
http://arxiv.org/abs/2410.03290
Autor:
Liu, Yang, Zhu, Xichou, Shen, Zhou, Liu, Yi, Li, Min, Chen, Yujun, John, Benzi, Ma, Zhenzhen, Hu, Tao, Li, Zhi, Xu, Zhiyang, Luo, Wei, Wang, Junhui
Large Language Models (LLMs) have recently displayed their extraordinary capabilities in language understanding. However, how to comprehensively assess the sentiment capabilities of LLMs continues to be a challenge. This paper investigates the abilit
Externí odkaz:
http://arxiv.org/abs/2409.02370
Autor:
Yang, Fengyu, Feng, Chao, Wang, Daniel, Wang, Tianye, Zeng, Ziyao, Xu, Zhiyang, Park, Hyoungseob, Ji, Pengliang, Zhao, Hanbin, Li, Yuanning, Wong, Alex
Understanding neural activity and information representation is crucial for advancing knowledge of brain function and cognition. Neural activity, measured through techniques like electrophysiology and neuroimaging, reflects various aspects of informa
Externí odkaz:
http://arxiv.org/abs/2407.14020
Autor:
Xu, Zhiyang, Liu, Minqian, Shen, Ying, Rimchala, Joy, Zhang, Jiaxin, Wang, Qifan, Cheng, Yu, Huang, Lifu
Recent advancements in Vision-Language Models (VLMs) have led to the development of Vision-Language Generalists (VLGs) capable of understanding and generating interleaved images and text. Despite these advances, VLGs still struggle to follow user ins
Externí odkaz:
http://arxiv.org/abs/2407.03604
Autor:
Liu, Minqian, Xu, Zhiyang, Lin, Zihao, Ashby, Trevor, Rimchala, Joy, Zhang, Jiaxin, Huang, Lifu
Interleaved text-and-image generation has been an intriguing research direction, where the models are required to generate both images and text pieces in an arbitrary order. Despite the emerging advancements in interleaved generation, the progress in
Externí odkaz:
http://arxiv.org/abs/2406.14643
Multimodal Large Language Models (MLLMs) have demonstrated remarkable proficiency in diverse tasks across different domains, with an increasing focus on improving their zero-shot generalization capabilities for unseen multimodal tasks. Multimodal ins
Externí odkaz:
http://arxiv.org/abs/2402.15896
Autor:
Xu, Zhiyang, Feng, Chao, Shao, Rulin, Ashby, Trevor, Shen, Ying, Jin, Di, Cheng, Yu, Wang, Qifan, Huang, Lifu
Despite vision-language models' (VLMs) remarkable capabilities as versatile visual assistants, two substantial challenges persist within the existing VLM frameworks: (1) lacking task diversity in pretraining and visual instruction tuning, and (2) ann
Externí odkaz:
http://arxiv.org/abs/2402.11690