Zobrazeno 1 - 3
of 3
pro vyhledávání: '"Duanmu, Haojie"'
Large language models (LLMs) can now handle longer sequences of tokens, enabling complex tasks like book understanding and generating lengthy novels. However, the key-value (KV) cache required for LLMs consumes substantial memory as context length in
Externí odkaz:
http://arxiv.org/abs/2405.06219
Autor:
Duan, Jiangfei, Lu, Runyu, Duanmu, Haojie, Li, Xiuhong, Zhang, Xingcheng, Lin, Dahua, Stoica, Ion, Zhang, Hao
Large language models (LLMs) have demonstrated remarkable performance, and organizations are racing to serve LLMs of varying sizes as endpoints for use-cases like chat, programming and search. However, efficiently serving multiple LLMs poses signific
Externí odkaz:
http://arxiv.org/abs/2404.02015
Large Language Models (LLMs) face significant deployment challenges due to their substantial memory requirements and the computational demands of auto-regressive text generation process. This paper addresses these challenges by focusing on the quanti
Externí odkaz:
http://arxiv.org/abs/2402.12065