Zobrazeno 1 - 10
of 53
pro vyhledávání: '"Goiri, Inigo"'
Autor:
Agrawal, Amey, Chen, Junda, Goiri, Íñigo, Ramjee, Ramachandran, Zhang, Chaojie, Tumanov, Alexey, Choukse, Esha
As large language models (LLMs) evolve to handle increasingly longer contexts, serving inference requests for context lengths in the range of millions of tokens presents unique challenges. While existing techniques are effective for training, they fa
Externí odkaz:
http://arxiv.org/abs/2409.17264
Autor:
Jain, Kunal, Parayil, Anjaly, Mallick, Ankur, Choukse, Esha, Qin, Xiaoting, Zhang, Jue, Goiri, Íñigo, Wang, Rujia, Bansal, Chetan, Rühle, Victor, Kulkarni, Anoop, Kofsky, Steve, Rajmohan, Saravan
Large Language Model (LLM) workloads have distinct prefill and decode phases with different compute and memory requirements which should ideally be accounted for when scheduling input queries across different LLM instances in a cluster. However exist
Externí odkaz:
http://arxiv.org/abs/2408.13510
The rapid evolution and widespread adoption of generative large language models (LLMs) have made them a pivotal workload in various applications. Today, LLM inference clusters receive a large number of queries with strict Service Level Objectives (SL
Externí odkaz:
http://arxiv.org/abs/2408.00741
Autor:
Parayil, Anjaly, Zhang, Jue, Qin, Xiaoting, Goiri, Íñigo, Huang, Lexiang, Zhu, Timothy, Bansal, Chetan
Cloud providers introduce features (e.g., Spot VMs, Harvest VMs, and Burstable VMs) and optimizations (e.g., oversubscription, auto-scaling, power harvesting, and overclocking) to improve efficiency and reliability. To effectively utilize these featu
Externí odkaz:
http://arxiv.org/abs/2405.07250
Autor:
Huang, Lexiang, Parayil, Anjaly, Zhang, Jue, Qin, Xiaoting, Bansal, Chetan, Stojkovic, Jovan, Zardoshti, Pantea, Misra, Pulkit, Cortez, Eli, Ghelman, Raphael, Goiri, Íñigo, Rajmohan, Saravan, Kleewein, Jim, Fonseca, Rodrigo, Zhu, Timothy, Bianchini, Ricardo
Today, cloud workloads are essentially opaque to the cloud platform. Typically, the only information the platform receives is the virtual machine (VM) type and possibly a decoration to the type (e.g., the VM is evictable). Similarly, workloads receiv
Externí odkaz:
http://arxiv.org/abs/2404.19143
With the ubiquitous use of modern large language models (LLMs) across industries, the inference serving for these models is ever expanding. Given the high compute and memory requirements of modern LLMs, more and more top-of-the-line GPUs are being de
Externí odkaz:
http://arxiv.org/abs/2403.20306
Autor:
Saurez, Enrique, Fried, Joshua, Chaudhry, Gohar Irfan, Choukse, Esha, Goiri, Íñigo, Elnikety, Sameh, Belay, Adam, Fonseca, Rodrigo
This report explores the use of kernel-bypass networking in FaaS runtimes and demonstrates how using Junction, a novel kernel-bypass system, as the backend for executing components in faasd can enhance performance and isolation. Junction achieves thi
Externí odkaz:
http://arxiv.org/abs/2403.03377
Autor:
Patel, Pratyush, Choukse, Esha, Zhang, Chaojie, Shah, Aashaka, Goiri, Íñigo, Maleki, Saeed, Bianchini, Ricardo
Recent innovations in generative large language models (LLMs) have made their applications and use-cases ubiquitous. This has led to large-scale deployments of these models, using complex, expensive, and power-hungry AI accelerators, most commonly GP
Externí odkaz:
http://arxiv.org/abs/2311.18677
Autor:
Patel, Pratyush, Choukse, Esha, Zhang, Chaojie, Goiri, Íñigo, Warrier, Brijesh, Mahalingam, Nithish, Bianchini, Ricardo
Recent innovation in large language models (LLMs), and their myriad use-cases have rapidly driven up the compute capacity demand for datacenter GPUs. Several cloud providers and other enterprises have made substantial plans of growth in their datacen
Externí odkaz:
http://arxiv.org/abs/2308.12908
Autor:
Romero, Francisco, Chaudhry, Gohar Irfan, Goiri, Íñigo, Gopa, Pragna, Batum, Paul, Yadwadkar, Neeraja J., Fonseca, Rodrigo, Kozyrakis, Christos, Bianchini, Ricardo
Function-as-a-Service (FaaS) has become an increasingly popular way for users to deploy their applications without the burden of managing the underlying infrastructure. However, existing FaaS platforms rely on remote storage to maintain state, limiti
Externí odkaz:
http://arxiv.org/abs/2104.13869