Výsledky vyhledávání - "Abhyankar, Reyna"

Report

Preble: Efficient Distributed Prompt Scheduling for LLM Serving

Autor: Srivatsa, Vikranth, He, Zijian, Abhyankar, Reyna, Li, Dongming, Zhang, Yiying

Prompts to large language models (LLMs) have evolved beyond simple user questions. For LLMs to solve complex problems, today's practices are to include domain-specific instructions, illustration of tool usages, and/or long context such as textbook ch

Externí odkaz: http://arxiv.org/abs/2407.00023

Zobrazit plný text záznamu

Report

InferCept: Efficient Intercept Support for Augmented Large Language Model Inference

Autor: Abhyankar, Reyna, He, Zijian, Srivatsa, Vikranth, Zhang, Hao, Zhang, Yiying

Large language models are increasingly integrated with external environments, tools, and agents like ChatGPT plugins to extend their capability beyond language-centric tasks. However, today's LLM inference systems are designed for standalone LLMs. Th

Externí odkaz: http://arxiv.org/abs/2402.01869

Zobrazit plný text záznamu

Report

SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification

Autor: Miao, Xupeng, Oliaro, Gabriele, Zhang, Zhihao, Cheng, Xinhao, Wang, Zeyu, Zhang, Zhengxin, Wong, Rae Ying Yee, Zhu, Alan, Yang, Lijie, Shi, Xiaoxiang, Shi, Chunan, Chen, Zhuoming, Arfeen, Daiyaan, Abhyankar, Reyna, Jia, Zhihao

This paper introduces SpecInfer, a system that accelerates generative large language model (LLM) serving with tree-based speculative inference and verification. The key idea behind SpecInfer is leveraging small speculative models to predict the LLM's

Externí odkaz: http://arxiv.org/abs/2305.09781

Zobrazit plný text záznamu

SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification

Autor: Miao, Xupeng, Oliaro, Gabriele, Zhang, Zhihao, Cheng, Xinhao, Wang, Zeyu, Wong, Rae Ying Yee, Chen, Zhuoming, Arfeen, Daiyaan, Abhyankar, Reyna, Jia, Zhihao

The high computational and memory requirements of generative large language models (LLMs) make it challenging to serve them quickly and cheaply. This paper introduces SpecInfer, an LLM serving system that accelerates generative LLM inference with spe

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::22400c0508feb1a5f4f5b9ff3a6dad8a

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání