Zobrazeno 1 - 10
of 714
pro vyhledávání: '"HOU, Lu"'
Autor:
Wang, Chunwei, Lu, Guansong, Yang, Junwei, Huang, Runhui, Han, Jianhua, Hou, Lu, Zhang, Wei, Xu, Hang
In this paper, we introduce ILLUME, a unified multimodal large language model (MLLM) that seamlessly integrates multimodal understanding and generation capabilities within a single large language model through a unified next-token prediction formulat
Externí odkaz:
http://arxiv.org/abs/2412.06673
Autor:
Lin, Haoran, Yu, Xianzhi, Zhao, Kang, Hou, Lu, Zhan, Zongyuan, Kamenev, Stanislav, Bao, Han, Hu, Ting, Wang, Mingkai, Chang, Qixin, Sui, Siyue, Sun, Weihao, Hu, Jiaxin, Yao, Jun, Yin, Zekun, Qian, Cheng, Zhang, Ying, Pan, Yinfei, Yang, Yu, Liu, Weiguo
FlashAttention series has been widely applied in the inference of large language models (LLMs). However, FlashAttention series only supports the high-level GPU architectures, e.g., Ampere and Hopper. At present, FlashAttention series is not easily tr
Externí odkaz:
http://arxiv.org/abs/2410.16663
Autor:
Sun, Yuxuan, Liu, Ruikang, Bai, Haoli, Bao, Han, Zhao, Kang, Li, Yuening, Hu, Jiaxin, Yu, Xianzhi, Hou, Lu, Yuan, Chun, Jiang, Xin, Liu, Wulong, Yao, Jun
Recently, quantization has been widely used for the compression and acceleration of large language models~(LLMs). Due to the outliers in LLMs, it is crucial to flatten weights and activations to minimize quantization error with the equally spaced qua
Externí odkaz:
http://arxiv.org/abs/2410.09426
Autor:
Chen, Kai, Gou, Yunhao, Huang, Runhui, Liu, Zhili, Tan, Daxin, Xu, Jing, Wang, Chunwei, Zhu, Yi, Zeng, Yihan, Yang, Kuo, Wang, Dingdong, Xiang, Kun, Li, Haoyuan, Bai, Haoli, Han, Jianhua, Li, Xiaohui, Jin, Weike, Xie, Nian, Zhang, Yu, Kwok, James T., Zhao, Hengshuang, Liang, Xiaodan, Yeung, Dit-Yan, Chen, Xiao, Li, Zhenguo, Zhang, Wei, Liu, Qun, Yao, Jun, Hong, Lanqing, Hou, Lu, Xu, Hang
GPT-4o, an omni-modal model that enables vocal conversations with diverse emotions and tones, marks a milestone for omni-modal foundation models. However, empowering Large Language Models to perceive and generate images, texts, and speeches end-to-en
Externí odkaz:
http://arxiv.org/abs/2409.18042
Currently, vision encoder models like Vision Transformers (ViTs) typically excel at image recognition tasks but cannot simultaneously support text recognition like human visual recognition. To address this limitation, we propose UNIT, a novel trainin
Externí odkaz:
http://arxiv.org/abs/2409.04095
Publikováno v:
ACM Comput. Surv. 56, 5, Article 130 (January 2024)
To alleviate the problem of information explosion, recommender systems are widely deployed to provide personalized information filtering services. Usually, embedding tables are employed in recommender systems to transform high-dimensional sparse one-
Externí odkaz:
http://arxiv.org/abs/2408.02304
Autor:
Huang, Runhui, Ding, Xinpeng, Wang, Chunwei, Han, Jianhua, Liu, Yulong, Zhao, Hengshuang, Xu, Hang, Hou, Lu, Zhang, Wei, Liang, Xiaodan
High-resolution inputs enable Large Vision-Language Models (LVLMs) to discern finer visual details, enhancing their comprehension capabilities. To reduce the training and computation costs caused by high-resolution input, one promising direction is t
Externí odkaz:
http://arxiv.org/abs/2407.08706
The visual projector, which bridges the vision and language modalities and facilitates cross-modal alignment, serves as a crucial component in MLLMs. However, measuring the effectiveness of projectors in vision-language alignment remains under-explor
Externí odkaz:
http://arxiv.org/abs/2405.20985
Autor:
Edalati, Ali, Ghaffari, Alireza, Asgharian, Masoud, Hou, Lu, Chen, Boxing, Nia, Vahid Partovi
Deployment of Large Language Models (LLMs) has major computational costs, due to their rapidly expanding size. Compression of LLMs reduces the memory footprint, latency, and energy required for their inference. Post-training Quantization (PTQ) techni
Externí odkaz:
http://arxiv.org/abs/2405.15025
Autor:
Chen, Sishuo, Li, Lei, Ren, Shuhuai, Gao, Rundong, Liu, Yuanxin, Bi, Xiaohan, Sun, Xu, Hou, Lu
Video paragraph captioning (VPC) involves generating detailed narratives for long videos, utilizing supportive modalities such as speech and event boundaries. However, the existing models are constrained by the assumption of constant availability of
Externí odkaz:
http://arxiv.org/abs/2403.19221