Zobrazeno 1 - 1
of 1
pro vyhledávání: '"Zuhri, Zayd Muhammad Kawakibi"'
Autor:
Zuhri, Zayd Muhammad Kawakibi, Adilazuarda, Muhammad Farid, Purwarianti, Ayu, Aji, Alham Fikri
Auto-regressive inference of transformers benefit greatly from Key-Value (KV) caching, but can lead to major memory bottlenecks as model size, batch size, and sequence length grow at scale. We introduce Multi-Layer Key-Value (MLKV) sharing, a novel a
Externí odkaz:
http://arxiv.org/abs/2406.09297