Výsledky vyhledávání - "Kim, Hongbeen"

Report

Efficient LLM Inference with Activation Checkpointing and Hybrid Caching

Autor: Lee, Sanghyeon, Kim, Hongbeen, Hwang, Soojin, Heo, Guseul, Noh, Minwoo, Huh, Jaehyuk

Recent large language models (LLMs) with enormous model sizes use many GPUs to meet memory capacity requirements incurring substantial costs for token generation. To provide cost-effective LLM inference with relaxed latency constraints, extensive res

Externí odkaz: http://arxiv.org/abs/2501.01792

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání