Zobrazeno 1 - 10
of 29
pro vyhledávání: '"Khandelwal, Anurag"'
We present Prompt Cache, an approach for accelerating inference for large language models (LLM) by reusing attention states across different LLM prompts. Many input prompts have overlapping text segments, such as system messages, prompt templates, an
Externí odkaz:
http://arxiv.org/abs/2311.04934
Autor:
Vuppalapati, Midhul, Fikioris, Giannis, Agarwal, Rachit, Cidon, Asaf, Khandelwal, Anurag, Tardos, Eva
We consider the problem of fair resource allocation in a system where user demands are dynamic, that is, where user demands vary over time. Our key observation is that the classical max-min fairness algorithm for resource allocation provides many des
Externí odkaz:
http://arxiv.org/abs/2305.17222
Caches at CPU nodes in disaggregated memory architectures amortize the high data access latency over the network. However, such caches are fundamentally unable to improve performance for workloads requiring pointer traversals across linked data struc
Externí odkaz:
http://arxiv.org/abs/2305.02388
Autor:
Sriram, Karthik, Pothukuchi, Raghavendra Pradyumna, Gerasimiuk, Michał, Ye, Oliver, Ugur, Muhammed, Manohar, Rajit, Khandelwal, Anurag, Bhattacharjee, Abhishek
Hull is an accelerator-rich distributed implantable Brain-Computer Interface (BCI) that reads biological neurons at data rates that are 2-3 orders of magnitude higher than the prior state of art, while supporting many neuroscientific applications. Pr
Externí odkaz:
http://arxiv.org/abs/2301.03103
We explore the design of scalable synchronization primitives for disaggregated shared memory. Porting existing synchronization primitives to disaggregated shared memory results in poor scalability with the number of application threads because they l
Externí odkaz:
http://arxiv.org/abs/2301.02576
Publikováno v:
In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22), pp. 719-734. 2022
Many applications that benefit from data offload to cloud services operate on private data. A now-long line of work has shown that, even when data is offloaded in an encrypted form, an adversary can learn sensitive information by analyzing data acces
Externí odkaz:
http://arxiv.org/abs/2205.14281
Autor:
Lee, Seung-seob, Yu, Yanpeng, Tang, Yupeng, Khandelwal, Anurag, Zhong, Lin, Bhattacharjee, Abhishek
Publikováno v:
SOSP '21 (2021) 488-504
Memory-compute disaggregation promises transparent elasticity, high utilization and balanced usage for resources in data centers by physically separating memory and compute into network-attached resource "blades". However, existing designs achieve pe
Externí odkaz:
http://arxiv.org/abs/2107.00164
Autor:
Jonas, Eric, Schleier-Smith, Johann, Sreekanti, Vikram, Tsai, Chia-Che, Khandelwal, Anurag, Pu, Qifan, Shankar, Vaishaal, Carreira, Joao, Krauth, Karl, Yadwadkar, Neeraja, Gonzalez, Joseph E., Popa, Raluca Ada, Stoica, Ion, Patterson, David A.
Serverless cloud computing handles virtually all the system administration operations needed to make it easier for programmers to use the cloud. It provides an interface that greatly simplifies cloud programming, and represents an evolution that para
Externí odkaz:
http://arxiv.org/abs/1902.03383
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.