Výsledky vyhledávání - "Sankaralingam, Ananth"

Report

HashEvict: A Pre-Attention KV Cache Eviction Strategy using Locality-Sensitive Hashing

Autor: Liu, Minghui, Rabbani, Tahseen, O'Halloran, Tony, Sankaralingam, Ananth, Hartley, Mary-Anne, Gravelle, Brian, Huang, Furong, Fermüller, Cornelia, Aloimonos, Yiannis

Transformer-based large language models (LLMs) use the key-value (KV) cache to significantly accelerate inference by storing the key and value embeddings of past tokens. However, this cache consumes significant GPU memory. In this work, we introduce

Externí odkaz: http://arxiv.org/abs/2412.16187

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání