Zobrazeno 1 - 10
of 47
pro vyhledávání: '"Cai, Tianle"'
Activation sparsity can enable practical inference speedups in large language models (LLMs) by reducing the compute and memory-movement required for matrix multiplications during the forward pass. However, existing methods face limitations that inhib
Externí odkaz:
http://arxiv.org/abs/2408.14690
Autor:
Li, Junyan, Chen, Delin, Cai, Tianle, Chen, Peihao, Hong, Yining, Chen, Zhenfang, Shen, Yikang, Gan, Chuang
Current high-resolution vision-language models encode images as high-resolution image tokens and exhaustively take all these tokens to compute attention, which significantly increases the computational cost. To address this problem, we propose FlexAt
Externí odkaz:
http://arxiv.org/abs/2407.20228
Autor:
Li, Andrew, Feng, Xianle, Narang, Siddhant, Peng, Austin, Cai, Tianle, Shah, Raj Sanjay, Varma, Sashank
When reading temporarily ambiguous garden-path sentences, misinterpretations sometimes linger past the point of disambiguation. This phenomenon has traditionally been studied in psycholinguistic experiments using online measures such as reading times
Externí odkaz:
http://arxiv.org/abs/2405.16042
Autor:
Li, Yuhong, Huang, Yingbing, Yang, Bowen, Venkitesh, Bharat, Locatelli, Acyr, Ye, Hanchen, Cai, Tianle, Lewis, Patrick, Chen, Deming
Large Language Models (LLMs) have made remarkable progress in processing extensive contexts, with the Key-Value (KV) cache playing a vital role in enhancing their performance. However, the growth of the KV cache in response to increasing input length
Externí odkaz:
http://arxiv.org/abs/2404.14469
Large Language Models (LLMs) have achieved remarkable results, but their increasing resource demand has become a major obstacle to the development of powerful and accessible super-human intelligence. This report introduces JetMoE-8B, a new LLM traine
Externí odkaz:
http://arxiv.org/abs/2404.07413
Autor:
Zhao, Yiran, Zheng, Wenyue, Cai, Tianle, Do, Xuan Long, Kawaguchi, Kenji, Goyal, Anirudh, Shieh, Michael
Safety of Large Language Models (LLMs) has become a critical issue given their rapid progresses. Greedy Coordinate Gradient (GCG) is shown to be effective in constructing adversarial prompts to break the aligned LLMs, but optimization of GCG is time-
Externí odkaz:
http://arxiv.org/abs/2403.01251
Autor:
Li, Muyang, Cai, Tianle, Cao, Jiaxin, Zhang, Qinsheng, Cai, Han, Bai, Junjie, Jia, Yangqing, Liu, Ming-Yu, Li, Kai, Han, Song
Diffusion models have achieved great success in synthesizing high-quality images. However, generating high-resolution images with diffusion models is still challenging due to the enormous computational costs, resulting in a prohibitive latency for in
Externí odkaz:
http://arxiv.org/abs/2402.19481
Large Language Models (LLMs) are typically trained in two phases: pre-training on large internet-scale datasets, and fine-tuning for downstream tasks. Given the higher computational demand of pre-training, it's intuitive to assume that fine-tuning ad
Externí odkaz:
http://arxiv.org/abs/2402.10193
Autor:
Cai, Tianle, Li, Yuhong, Geng, Zhengyang, Peng, Hongwu, Lee, Jason D., Chen, Deming, Dao, Tri
Large Language Models (LLMs) employ auto-regressive decoding that requires sequential computation, with each step reliant on the previous one's output. This creates a bottleneck as each step necessitates moving the full model parameters from High-Ban
Externí odkaz:
http://arxiv.org/abs/2401.10774
We introduce Retrieval-Based Speculative Decoding (REST), a novel algorithm designed to speed up language model generation. The key insight driving the development of REST is the observation that the process of text generation often includes certain
Externí odkaz:
http://arxiv.org/abs/2311.08252