Zobrazeno 1 - 1
of 1
pro vyhledávání: '"Kuang, Chuqiao"'
Autor:
Zhang, Jiebin, Zhu, Dawei, Song, Yifan, Wu, Wenhao, Kuang, Chuqiao, Li, Xiaoguang, Shang, Lifeng, Liu, Qun, Li, Sujian
As large language models (LLMs) process increasing context windows, the memory usage of KV cache has become a critical bottleneck during inference. The mainstream KV compression methods, including KV pruning and KV quantization, primarily focus on ei
Externí odkaz:
http://arxiv.org/abs/2412.12706