Zobrazeno 1 - 10
of 1 694
pro vyhledávání: '"A. Torrellas"'
Autor:
Iliakopoulou, Nikoleta, Stojkovic, Jovan, Alverti, Chloe, Xu, Tianyin, Franke, Hubertus, Torrellas, Josep
The widespread adoption of LLMs has driven an exponential rise in their deployment, imposing substantial demands on inference clusters. These clusters must handle numerous concurrent queries for different LLM downstream tasks. To handle multi-task se
Externí odkaz:
http://arxiv.org/abs/2411.17741
Autor:
Chen, Deming, Youssef, Alaa, Pendse, Ruchi, Schleife, André, Clark, Bryan K., Hamann, Hendrik, He, Jingrui, Laino, Teodoro, Varshney, Lav, Wang, Yuxiong, Sil, Avirup, Jabbarvand, Reyhaneh, Xu, Tianyin, Kindratenko, Volodymyr, Costa, Carlos, Adve, Sarita, Mendis, Charith, Zhang, Minjia, Núñez-Corrales, Santiago, Ganti, Raghu, Srivatsa, Mudhakar, Kim, Nam Sung, Torrellas, Josep, Huang, Jian, Seelam, Seetharami, Nahrstedt, Klara, Abdelzaher, Tarek, Eilam, Tamar, Zhao, Huimin, Manica, Matteo, Iyer, Ravishankar, Hirzel, Martin, Adve, Vikram, Marinov, Darko, Franke, Hubertus, Tong, Hanghang, Ainsworth, Elizabeth, Zhao, Han, Vasisht, Deepak, Do, Minh, Oliveira, Fabio, Pacifici, Giovanni, Puri, Ruchir, Nagpurkar, Priya
This white paper, developed through close collaboration between IBM Research and UIUC researchers within the IIDAI Institute, envisions transforming hybrid cloud systems to meet the growing complexity of AI workloads through innovative, full-stack co
Externí odkaz:
http://arxiv.org/abs/2411.13239
Autor:
Ranawaka, Isuru, Hussain, Md Taufique, Block, Charles, Gerogiannis, Gerasimos, Torrellas, Josep, Azad, Ariful
We consider a sparse matrix-matrix multiplication (SpGEMM) setting where one matrix is square and the other is tall and skinny. This special variant, called TS-SpGEMM, has important applications in multi-source breadth-first search, influence maximiz
Externí odkaz:
http://arxiv.org/abs/2408.11988
The rapid evolution and widespread adoption of generative large language models (LLMs) have made them a pivotal workload in various applications. Today, LLM inference clusters receive a large number of queries with strict Service Level Objectives (SL
Externí odkaz:
http://arxiv.org/abs/2408.00741
Publikováno v:
29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2024), Volume 2, pages 582-600, La Jolla, CA, USA, May 2024
Last-level cache side-channel attacks have been mostly demonstrated in highly-controlled, quiescent local environments. Hence, it is unclear whether such attacks are feasible in a production cloud environment. In the cloud, side channels are flooded
Externí odkaz:
http://arxiv.org/abs/2405.12469
With the ubiquitous use of modern large language models (LLMs) across industries, the inference serving for these models is ever expanding. Given the high compute and memory requirements of modern LLMs, more and more top-of-the-line GPUs are being de
Externí odkaz:
http://arxiv.org/abs/2403.20306
Autor:
Lenadora, Damitha, Sathia, Vimarsh, Gerogiannis, Gerasimos, Yesil, Serif, Torrellas, Josep, Mendis, Charith
Over the years, many frameworks and optimization techniques have been proposed to accelerate graph neural networks (GNNs). Compared to the optimizations explored in these systems, we observe that different matrix re-associations of GNN computations l
Externí odkaz:
http://arxiv.org/abs/2306.15155
Side-channel attacks that use machine learning (ML) for signal analysis have become prominent threats to computer security, as ML models easily find patterns in signals. To address this problem, this paper explores using Adversarial Machine Learning
Externí odkaz:
http://arxiv.org/abs/2302.01474
Byte-addressable, non-volatile memory (NVM) is emerging as a promising technology. To facilitate its wide adoption, employing NVM in managed runtimes like JVM has proven to be an effective approach (i.e., managed NVM). However, such an approach is ru
Externí odkaz:
http://arxiv.org/abs/2205.06444
Autor:
Kokolis, Apostolos, Mantri, Namrata, Ganapathy, Shrikanth, Torrellas, Josep, Kalamatianos, John
The increased memory demands of workloads is putting high pressure on Last Level Caches (LLCs). Unfortunately, there is limited opportunity to increase the capacity of LLCs due to the area and power requirements of the underlying SRAM technology. Int
Externí odkaz:
http://arxiv.org/abs/2112.10632