Výsledky vyhledávání - "Kumar, Abhishek Vijaya"

Report

Responsive ML inference in multi-tenanted environments using AQUA

Autor: Kumar, Abhishek Vijaya, Antichi, Gianni, Singh, Rachee

Modern model serving engines infer prompts on large language models in batches. While batch processing prompts leads to high inference throughput, it delays responding to requests that do not fit in a batch, potentially starving them. We propose that

Externí odkaz: http://arxiv.org/abs/2407.21255

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání