Zobrazeno 1 - 4
of 4
pro vyhledávání: '"Abhyankar, Reyna"'
Prompts to large language models (LLMs) have evolved beyond simple user questions. For LLMs to solve complex problems, today's practices are to include domain-specific instructions, illustration of tool usages, and/or long context such as textbook ch
Externí odkaz:
http://arxiv.org/abs/2407.00023
Large language models are increasingly integrated with external environments, tools, and agents like ChatGPT plugins to extend their capability beyond language-centric tasks. However, today's LLM inference systems are designed for standalone LLMs. Th
Externí odkaz:
http://arxiv.org/abs/2402.01869
Autor:
Miao, Xupeng, Oliaro, Gabriele, Zhang, Zhihao, Cheng, Xinhao, Wang, Zeyu, Zhang, Zhengxin, Wong, Rae Ying Yee, Zhu, Alan, Yang, Lijie, Shi, Xiaoxiang, Shi, Chunan, Chen, Zhuoming, Arfeen, Daiyaan, Abhyankar, Reyna, Jia, Zhihao
This paper introduces SpecInfer, a system that accelerates generative large language model (LLM) serving with tree-based speculative inference and verification. The key idea behind SpecInfer is leveraging small speculative models to predict the LLM's
Externí odkaz:
http://arxiv.org/abs/2305.09781
Autor:
Miao, Xupeng, Oliaro, Gabriele, Zhang, Zhihao, Cheng, Xinhao, Wang, Zeyu, Wong, Rae Ying Yee, Chen, Zhuoming, Arfeen, Daiyaan, Abhyankar, Reyna, Jia, Zhihao
The high computational and memory requirements of generative large language models (LLMs) make it challenging to serve them quickly and cheaply. This paper introduces SpecInfer, an LLM serving system that accelerates generative LLM inference with spe
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::22400c0508feb1a5f4f5b9ff3a6dad8a