Zobrazeno 1 - 10
of 296
pro vyhledávání: '"Liu Xihui"'
Autor:
Li, Shiyao, Hu, Yingchun, Ning, Xuefei, Liu, Xihui, Hong, Ke, Jia, Xiaotao, Li, Xiuhong, Yan, Yaqi, Ran, Pei, Dai, Guohao, Yan, Shengen, Yang, Huazhong, Wang, Yu
Vision-Language Models (VLMs) have enabled a variety of real-world applications. The large parameter size of VLMs brings large memory and computation overhead which poses significant challenges for deployment. Post-Training Quantization (PTQ) is an e
Externí odkaz:
http://arxiv.org/abs/2412.19509
Autor:
Wang, Yuqing, Ren, Shuhuai, Lin, Zhijie, Han, Yujin, Guo, Haoyuan, Yang, Zhenheng, Zou, Difan, Feng, Jiashi, Liu, Xihui
Autoregressive models have emerged as a powerful approach for visual generation but suffer from slow inference speed due to their sequential token-by-token prediction process. In this paper, we propose a simple yet effective approach for parallelized
Externí odkaz:
http://arxiv.org/abs/2412.15119
Vision-Language Models (VLMs) have shown promising capabilities in handling various multimodal tasks, yet they struggle in long-context scenarios, particularly in tasks involving videos, high-resolution images, or lengthy image-text documents. In our
Externí odkaz:
http://arxiv.org/abs/2412.09616
The advent of Multimodal Large Language Models, leveraging the power of Large Language Models, has recently demonstrated superior multimodal understanding and reasoning abilities, heralding a new era for artificial general intelligence. However, achi
Externí odkaz:
http://arxiv.org/abs/2412.04447
Recent developments in Large Language Models pre-trained on extensive corpora have shown significant success in various natural language processing tasks with minimal fine-tuning. This success offers new promise for robotics, which has long been cons
Externí odkaz:
http://arxiv.org/abs/2412.04445
Text-to-video generation models have shown significant progress in the recent years. However, they still struggle with generating complex dynamic scenes based on compositional text prompts, such as attribute binding for multiple objects, temporal dyn
Externí odkaz:
http://arxiv.org/abs/2412.04440
Autor:
Huang, Zehuan, Guo, Yuan-Chen, An, Xingqiao, Yang, Yunhan, Li, Yangguang, Zou, Zi-Xin, Liang, Ding, Liu, Xihui, Cao, Yan-Pei, Sheng, Lu
This paper introduces MIDI, a novel paradigm for compositional 3D scene generation from a single image. Unlike existing methods that rely on reconstruction or retrieval techniques or recent approaches that employ multi-stage object-by-object generati
Externí odkaz:
http://arxiv.org/abs/2412.03558
Autor:
Yang, Yunhan, Huang, Yukun, Guo, Yuan-Chen, Lu, Liangjun, Wu, Xiaoyang, Lam, Edmund Y., Cao, Yan-Pei, Liu, Xihui
3D part segmentation is a crucial and challenging task in 3D perception, playing a vital role in applications such as robotics, 3D generation, and 3D editing. Recent methods harness the powerful Vision Language Models (VLMs) for 2D-to-3D knowledge di
Externí odkaz:
http://arxiv.org/abs/2411.07184
Autor:
Qin, Yiran, Shi, Zhelun, Yu, Jiwen, Wang, Xijun, Zhou, Enshen, Li, Lijun, Yin, Zhenfei, Liu, Xihui, Sheng, Lu, Shao, Jing, Bai, Lei, Ouyang, Wanli, Zhang, Ruimao
Recent advancements in predictive models have demonstrated exceptional capabilities in predicting the future state of objects and scenes. However, the lack of categorization based on inherent characteristics continues to hinder the progress of predic
Externí odkaz:
http://arxiv.org/abs/2410.18072
Autor:
Fang, Rongyao, Duan, Chengqi, Wang, Kun, Li, Hao, Tian, Hao, Zeng, Xingyu, Zhao, Rui, Dai, Jifeng, Li, Hongsheng, Liu, Xihui
Recent advancements in multimodal foundation models have yielded significant progress in vision-language understanding. Initial attempts have also explored the potential of multimodal large language models (MLLMs) for visual content generation. Howev
Externí odkaz:
http://arxiv.org/abs/2410.13861