Výsledky vyhledávání

Report

BlendServe: Optimizing Offline Inference for Auto-regressive Large Models with Resource-aware Batching

Autor: Zhao, Yilong, Yang, Shuo, Zhu, Kan, Zheng, Lianmin, Kasikci, Baris, Zhou, Yang, Xing, Jiarong, Stoica, Ion

Offline batch inference, which leverages the flexibility of request batching to achieve higher throughput and lower costs, is becoming more popular for latency-insensitive applications. Meanwhile, recent progress in model capability and modality make

Externí odkaz: http://arxiv.org/abs/2411.16102

Zobrazit plný text záznamu

Report

MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs

Autor: Cao, Shiyi, Liu, Shu, Griggs, Tyler, Schafhalter, Peter, Liu, Xiaoxuan, Sheng, Ying, Gonzalez, Joseph E., Zaharia, Matei, Stoica, Ion

Efficient deployment of large language models, particularly Mixture of Experts (MoE), on resource-constrained platforms presents significant challenges, especially in terms of computational efficiency and memory utilization. The MoE architecture, ren

Externí odkaz: http://arxiv.org/abs/2411.11217

Zobrazit plný text záznamu

Report

Pie: Pooling CPU Memory for LLM Inference

Autor: Xu, Yi, Mao, Ziming, Mo, Xiangxi, Liu, Shu, Stoica, Ion

The rapid growth of LLMs has revolutionized natural language processing and AI analysis, but their increasing size and memory demands present significant challenges. A common solution is to spill over to CPU memory; however, traditional GPU-CPU memor

Externí odkaz: http://arxiv.org/abs/2411.09317

Zobrazit plný text záznamu

Report

Min-Max Framework for Majorization-Minimization Algorithms in Signal Processing Applications: An Overview

Autor: Saini, Astha, Stoica, Petre, Babu, Prabhu, Arora, Aakash

Publikováno v: Foundations and Trends in Signal Processing 2024: Vol. 18: No. 4, pp 310-389

This monograph presents a theoretical background and a broad introduction to the Min-Max Framework for Majorization-Minimization (MM4MM), an algorithmic methodology for solving minimization problems by formulating them as min-max problems and then em

Externí odkaz: http://arxiv.org/abs/2411.07561

Zobrazit plný text záznamu

Report

Terahertz-permittivity of Carbon Nitrides: Revealing humidity-enhanced dielectric properties on the picosecond timescales relevant for charge carrier photogeneration

Autor: Jahangir, Reehab, Podjaski, Filip, Alimard, Paransa, Hillman, Sam A. J., Davidson, Stuart, Stoica, Stefan, Kafizas, Andreas, Naftaly, Mira, Durrant, James R.

Organic based semiconductor materials offer emerging and sustainable solutions for solar energy conversion technologies and electronics. However, knowledge of their intrinsic (photo)physical properties and light-matter interactions is often limited,

Externí odkaz: http://arxiv.org/abs/2411.06226

Zobrazit plný text záznamu

Report

SkyServe: Serving AI Models across Regions and Clouds with Spot Instances

Autor: Mao, Ziming, Xia, Tian, Wu, Zhanghao, Chiang, Wei-Lin, Griggs, Tyler, Bhardwaj, Romil, Yang, Zongheng, Shenker, Scott, Stoica, Ion

Recent years have witnessed an explosive growth of AI models. The high cost of hosting AI services on GPUs and their demanding service requirements, make it timely and challenging to lower service costs and guarantee service quality. While spot insta

Externí odkaz: http://arxiv.org/abs/2411.01438

Zobrazit plný text záznamu

Report

NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference

Autor: Jiang, Xuanlin, Zhou, Yang, Cao, Shiyi, Stoica, Ion, Yu, Minlan

Online LLM inference powers many exciting applications such as intelligent chatbots and autonomous agents. Modern LLM inference engines widely rely on request batching to improve inference throughput, aiming to make it cost-efficient when running on

Externí odkaz: http://arxiv.org/abs/2411.01142

Zobrazit plný text záznamu

Report

Model merging with SVD to tie the Knots

Autor: Stoica, George, Ramesh, Pratik, Ecsedi, Boglarka, Choshen, Leshem, Hoffman, Judy

Recent model merging methods demonstrate that the parameters of fully-finetuned models specializing in distinct tasks can be combined into one model capable of solving all tasks without retraining. Yet, this success does not transfer well when mergin

Externí odkaz: http://arxiv.org/abs/2410.19735

Zobrazit plný text záznamu

Report

Managing Bandwidth: The Key to Cloud-Assisted Autonomous Driving

Autor: Krentsel, Alexander, Schafhalter, Peter, Gonzalez, Joseph E., Ratnasamy, Sylvia, Shenker, Scott, Stoica, Ion

Prevailing wisdom asserts that one cannot rely on the cloud for critical real-time control systems like self-driving cars. We argue that we can, and must. Following the trends of increasing model sizes, improvements in hardware, and evolving mobile n

Externí odkaz: http://arxiv.org/abs/2410.16227

Zobrazit plný text záznamu

Report

How to Evaluate Reward Models for RLHF

Autor: Frick, Evan, Li, Tianle, Chen, Connor, Chiang, Wei-Lin, Angelopoulos, Anastasios N., Jiao, Jiantao, Zhu, Banghua, Gonzalez, Joseph E., Stoica, Ion

We introduce a new benchmark for reward models that quantifies their ability to produce strong language models through RLHF (Reinforcement Learning from Human Feedback). The gold-standard approach is to run a full RLHF training pipeline and directly

Externí odkaz: http://arxiv.org/abs/2410.14872

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání