Výsledky vyhledávání - "Kuang, Chuqiao"

Report

More Tokens, Lower Precision: Towards the Optimal Token-Precision Trade-off in KV Cache Compression

Autor: Zhang, Jiebin, Zhu, Dawei, Song, Yifan, Wu, Wenhao, Kuang, Chuqiao, Li, Xiaoguang, Shang, Lifeng, Liu, Qun, Li, Sujian

As large language models (LLMs) process increasing context windows, the memory usage of KV cache has become a critical bottleneck during inference. The mainstream KV compression methods, including KV pruning and KV quantization, primarily focus on ei

Externí odkaz: http://arxiv.org/abs/2412.12706

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání