Zobrazeno 1 - 10
of 8 286
pro vyhledávání: '"STOICA, P."'
Autor:
Zhao, Yilong, Yang, Shuo, Zhu, Kan, Zheng, Lianmin, Kasikci, Baris, Zhou, Yang, Xing, Jiarong, Stoica, Ion
Offline batch inference, which leverages the flexibility of request batching to achieve higher throughput and lower costs, is becoming more popular for latency-insensitive applications. Meanwhile, recent progress in model capability and modality make
Externí odkaz:
http://arxiv.org/abs/2411.16102
Autor:
Cao, Shiyi, Liu, Shu, Griggs, Tyler, Schafhalter, Peter, Liu, Xiaoxuan, Sheng, Ying, Gonzalez, Joseph E., Zaharia, Matei, Stoica, Ion
Efficient deployment of large language models, particularly Mixture of Experts (MoE), on resource-constrained platforms presents significant challenges, especially in terms of computational efficiency and memory utilization. The MoE architecture, ren
Externí odkaz:
http://arxiv.org/abs/2411.11217
The rapid growth of LLMs has revolutionized natural language processing and AI analysis, but their increasing size and memory demands present significant challenges. A common solution is to spill over to CPU memory; however, traditional GPU-CPU memor
Externí odkaz:
http://arxiv.org/abs/2411.09317
Publikováno v:
Foundations and Trends in Signal Processing 2024: Vol. 18: No. 4, pp 310-389
This monograph presents a theoretical background and a broad introduction to the Min-Max Framework for Majorization-Minimization (MM4MM), an algorithmic methodology for solving minimization problems by formulating them as min-max problems and then em
Externí odkaz:
http://arxiv.org/abs/2411.07561
Autor:
Jahangir, Reehab, Podjaski, Filip, Alimard, Paransa, Hillman, Sam A. J., Davidson, Stuart, Stoica, Stefan, Kafizas, Andreas, Naftaly, Mira, Durrant, James R.
Organic based semiconductor materials offer emerging and sustainable solutions for solar energy conversion technologies and electronics. However, knowledge of their intrinsic (photo)physical properties and light-matter interactions is often limited,
Externí odkaz:
http://arxiv.org/abs/2411.06226
Autor:
Mao, Ziming, Xia, Tian, Wu, Zhanghao, Chiang, Wei-Lin, Griggs, Tyler, Bhardwaj, Romil, Yang, Zongheng, Shenker, Scott, Stoica, Ion
Recent years have witnessed an explosive growth of AI models. The high cost of hosting AI services on GPUs and their demanding service requirements, make it timely and challenging to lower service costs and guarantee service quality. While spot insta
Externí odkaz:
http://arxiv.org/abs/2411.01438
Online LLM inference powers many exciting applications such as intelligent chatbots and autonomous agents. Modern LLM inference engines widely rely on request batching to improve inference throughput, aiming to make it cost-efficient when running on
Externí odkaz:
http://arxiv.org/abs/2411.01142
Recent model merging methods demonstrate that the parameters of fully-finetuned models specializing in distinct tasks can be combined into one model capable of solving all tasks without retraining. Yet, this success does not transfer well when mergin
Externí odkaz:
http://arxiv.org/abs/2410.19735
Autor:
Krentsel, Alexander, Schafhalter, Peter, Gonzalez, Joseph E., Ratnasamy, Sylvia, Shenker, Scott, Stoica, Ion
Prevailing wisdom asserts that one cannot rely on the cloud for critical real-time control systems like self-driving cars. We argue that we can, and must. Following the trends of increasing model sizes, improvements in hardware, and evolving mobile n
Externí odkaz:
http://arxiv.org/abs/2410.16227
Autor:
Frick, Evan, Li, Tianle, Chen, Connor, Chiang, Wei-Lin, Angelopoulos, Anastasios N., Jiao, Jiantao, Zhu, Banghua, Gonzalez, Joseph E., Stoica, Ion
We introduce a new benchmark for reward models that quantifies their ability to produce strong language models through RLHF (Reinforcement Learning from Human Feedback). The gold-standard approach is to run a full RLHF training pipeline and directly
Externí odkaz:
http://arxiv.org/abs/2410.14872