Zobrazeno 1 - 1
of 1
pro vyhledávání: '"Rong, Yutian"'
Current Retrieval-Augmented Generation (RAG) systems concatenate and process numerous retrieved document chunks for prefill which requires a large volume of computation, therefore leading to significant latency in time-to-first-token (TTFT). To reduc
Externí odkaz:
http://arxiv.org/abs/2410.07590