Zobrazeno 1 - 10
of 26 180
pro vyhledávání: '"A. Stoica"'
The rapid growth of LLMs has revolutionized natural language processing and AI analysis, but their increasing size and memory demands present significant challenges. A common solution is to spill over to CPU memory; however, traditional GPU-CPU memor
Externí odkaz:
http://arxiv.org/abs/2411.09317
Publikováno v:
Foundations and Trends in Signal Processing 2024: Vol. 18: No. 4, pp 310-389
This monograph presents a theoretical background and a broad introduction to the Min-Max Framework for Majorization-Minimization (MM4MM), an algorithmic methodology for solving minimization problems by formulating them as min-max problems and then em
Externí odkaz:
http://arxiv.org/abs/2411.07561
Autor:
Jahangir, Reehab, Podjaski, Filip, Alimard, Paransa, Hillman, Sam A. J., Davidson, Stuart, Stoica, Stefan, Kafizas, Andreas, Naftaly, Mira, Durrant, James R.
Organic based semiconductor materials offer emerging and sustainable solutions for solar energy conversion technologies and electronics. However, knowledge of their intrinsic (photo)physical properties and light-matter interactions is often limited,
Externí odkaz:
http://arxiv.org/abs/2411.06226
Autor:
Mao, Ziming, Xia, Tian, Wu, Zhanghao, Chiang, Wei-Lin, Griggs, Tyler, Bhardwaj, Romil, Yang, Zongheng, Shenker, Scott, Stoica, Ion
Recent years have witnessed an explosive growth of AI models. The high cost of hosting AI services on GPUs and their demanding service requirements, make it timely and challenging to lower service costs and guarantee service quality. While spot insta
Externí odkaz:
http://arxiv.org/abs/2411.01438
Online LLM inference powers many exciting applications such as intelligent chatbots and autonomous agents. Modern LLM inference engines widely rely on request batching to improve inference throughput, aiming to make it cost-efficient when running on
Externí odkaz:
http://arxiv.org/abs/2411.01142
Recent model merging methods demonstrate that the parameters of fully-finetuned models specializing in distinct tasks can be combined into one model capable of solving all tasks without retraining. Yet, this success does not transfer well when mergin
Externí odkaz:
http://arxiv.org/abs/2410.19735
Autor:
Krentsel, Alexander, Schafhalter, Peter, Gonzalez, Joseph E., Ratnasamy, Sylvia, Shenker, Scott, Stoica, Ion
Prevailing wisdom asserts that one cannot rely on the cloud for critical real-time control systems like self-driving cars. We argue that we can, and must. Following the trends of increasing model sizes, improvements in hardware, and evolving mobile n
Externí odkaz:
http://arxiv.org/abs/2410.16227
Autor:
Frick, Evan, Li, Tianle, Chen, Connor, Chiang, Wei-Lin, Angelopoulos, Anastasios N., Jiao, Jiantao, Zhu, Banghua, Gonzalez, Joseph E., Stoica, Ion
We introduce a new benchmark for reward models that quantifies their ability to produce strong language models through RLHF (Reinforcement Learning from Human Feedback). The gold-standard approach is to run a full RLHF training pipeline and directly
Externí odkaz:
http://arxiv.org/abs/2410.14872
Autor:
Tan, Sijun, Zhuang, Siyuan, Montgomery, Kyle, Tang, William Y., Cuadron, Alejandro, Wang, Chenguang, Popa, Raluca Ada, Stoica, Ion
LLM-based judges have emerged as a scalable alternative to human evaluation and are increasingly used to assess, compare, and improve models. However, the reliability of LLM-based judges themselves is rarely scrutinized. As LLMs become more advanced,
Externí odkaz:
http://arxiv.org/abs/2410.12784
In Large Language Model (LLM) inference, the output length of an LLM request is typically regarded as not known a priori. Consequently, most LLM serving systems employ a simple First-come-first-serve (FCFS) scheduling strategy, leading to Head-Of-Lin
Externí odkaz:
http://arxiv.org/abs/2408.15792