Výsledky vyhledávání - "Pekhimenko, Gennady"

Report

Tally: Non-Intrusive Performance Isolation for Concurrent Deep Learning Workloads

Autor: Zhao, Wei, Jayarajan, Anand, Pekhimenko, Gennady

GPU underutilization is a significant concern in many production deep learning clusters, leading to prolonged job queues and increased operational expenses. A promising solution to this inefficiency is GPU sharing, which improves resource utilization

Externí odkaz: http://arxiv.org/abs/2410.07381

Zobrazit plný text záznamu

Report

APPL: A Prompt Programming Language for Harmonious Integration of Programs and Large Language Model Prompts

Autor: Dong, Honghua, Su, Qidong, Gao, Yubo, Li, Zhaoyu, Ruan, Yangjun, Pekhimenko, Gennady, Maddison, Chris J., Si, Xujie

Large Language Models (LLMs) have become increasingly capable of handling diverse tasks with the aid of well-crafted prompts and integration of external tools, but as task complexity rises, the workflow involving LLMs can be complicated and thus chal

Externí odkaz: http://arxiv.org/abs/2406.13161

Zobrazit plný text záznamu

Report

Proteus: Preserving Model Confidentiality during Graph Optimizations

Autor: Gao, Yubo, Haghifam, Maryam, Giannoula, Christina, Tu, Renbo, Pekhimenko, Gennady, Vijaykumar, Nandita

Deep learning (DL) models have revolutionized numerous domains, yet optimizing them for computational efficiency remains a challenging endeavor. Development of new DL models typically involves two parties: the model developers and performance optimiz

Externí odkaz: http://arxiv.org/abs/2404.12512

Zobrazit plný text záznamu

Report

PyGim: An Efficient Graph Neural Network Library for Real Processing-In-Memory Architectures

Autor: Giannoula, Christina, Yang, Peiming, Vega, Ivan Fernandez, Yang, Jiacheng, Durvasula, Sankeerth, Li, Yu Xin, Sadrosadati, Mohammad, Luna, Juan Gomez, Mutlu, Onur, Pekhimenko, Gennady

Graph Neural Networks (GNNs) are emerging ML models to analyze graph-structure data. Graph Neural Network (GNN) execution involves both compute-intensive and memory-intensive kernels, the latter dominates the total time, being significantly bottlenec

Externí odkaz: http://arxiv.org/abs/2402.16731

Zobrazit plný text záznamu

Report

Lightweight Frequency-Based Tiering for CXL Memory Systems

Autor: Song, Kevin, Yang, Jiacheng, Liu, Sihang, Pekhimenko, Gennady

Modern workloads are demanding increasingly larger memory capacity. Compute Express Link (CXL)-based memory tiering has emerged as a promising solution for addressing this trend by utilizing traditional DRAM alongside slow-tier CXL-memory devices in

Externí odkaz: http://arxiv.org/abs/2312.04789

Zobrazit plný text záznamu

Report

Minuet: Accelerating 3D Sparse Convolutions on GPUs

Autor: Yang, Jiacheng, Giannoula, Christina, Wu, Jun, Elhoushi, Mostafa, Gleeson, James, Pekhimenko, Gennady

Sparse Convolution (SC) is widely used for processing 3D point clouds that are inherently sparse. Different from dense convolution, SC preserves the sparsity of the input point cloud by only allowing outputs to specific locations. To efficiently comp

Externí odkaz: http://arxiv.org/abs/2401.06145

Zobrazit plný text záznamu

Report

TorchProbe: Fuzzing Dynamic Deep Learning Compilers

Autor: Su, Qidong, Geng, Chuqin, Pekhimenko, Gennady, Si, Xujie

Static and dynamic computational graphs represent two distinct approaches to constructing deep learning frameworks. The former prioritizes compiler-based optimizations, while the latter focuses on programmability and user-friendliness. The recent rel

Externí odkaz: http://arxiv.org/abs/2310.20078

Zobrazit plný text záznamu

Report

The Synergy of Speculative Decoding and Batching in Serving Large Language Models

Autor: Su, Qidong, Giannoula, Christina, Pekhimenko, Gennady

Large Language Models (LLMs) like GPT are state-of-the-art text generation models that provide significant assistance in daily routines. However, LLM execution is inherently sequential, since they only produce one token at a time, thus incurring low

Externí odkaz: http://arxiv.org/abs/2310.18813

Zobrazit plný text záznamu

Report

Guaranteed Approximation Bounds for Mixed-Precision Neural Operators

Autor: Tu, Renbo, White, Colin, Kossaifi, Jean, Bonev, Boris, Kovachki, Nikola, Pekhimenko, Gennady, Azizzadenesheli, Kamyar, Anandkumar, Anima

Neural operators, such as Fourier Neural Operators (FNO), form a principled approach for learning solution operators for PDEs and other mappings between function spaces. However, many real-world problems require high-resolution training data, and the

Externí odkaz: http://arxiv.org/abs/2307.15034

Zobrazit plný text záznamu

Report

TiLT: A Time-Centric Approach for Stream Query Optimization and Parallelization

Autor: Jayarajan, Anand, Zhao, Wei, Sun, Yudi, Pekhimenko, Gennady

Publikováno v: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, 2023

Stream processing engines (SPEs) are widely used for large scale streaming analytics over unbounded time-ordered data streams. Modern day streaming analytics applications exhibit diverse compute characteristics and demand strict latency and throughpu

Externí odkaz: http://arxiv.org/abs/2301.12030

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání