Zobrazeno 1 - 10
of 114
pro vyhledávání: '"Pekhimenko, Gennady"'
GPU underutilization is a significant concern in many production deep learning clusters, leading to prolonged job queues and increased operational expenses. A promising solution to this inefficiency is GPU sharing, which improves resource utilization
Externí odkaz:
http://arxiv.org/abs/2410.07381
Autor:
Dong, Honghua, Su, Qidong, Gao, Yubo, Li, Zhaoyu, Ruan, Yangjun, Pekhimenko, Gennady, Maddison, Chris J., Si, Xujie
Large Language Models (LLMs) have become increasingly capable of handling diverse tasks with the aid of well-crafted prompts and integration of external tools, but as task complexity rises, the workflow involving LLMs can be complicated and thus chal
Externí odkaz:
http://arxiv.org/abs/2406.13161
Autor:
Gao, Yubo, Haghifam, Maryam, Giannoula, Christina, Tu, Renbo, Pekhimenko, Gennady, Vijaykumar, Nandita
Deep learning (DL) models have revolutionized numerous domains, yet optimizing them for computational efficiency remains a challenging endeavor. Development of new DL models typically involves two parties: the model developers and performance optimiz
Externí odkaz:
http://arxiv.org/abs/2404.12512
Autor:
Giannoula, Christina, Yang, Peiming, Vega, Ivan Fernandez, Yang, Jiacheng, Durvasula, Sankeerth, Li, Yu Xin, Sadrosadati, Mohammad, Luna, Juan Gomez, Mutlu, Onur, Pekhimenko, Gennady
Graph Neural Networks (GNNs) are emerging ML models to analyze graph-structure data. Graph Neural Network (GNN) execution involves both compute-intensive and memory-intensive kernels, the latter dominates the total time, being significantly bottlenec
Externí odkaz:
http://arxiv.org/abs/2402.16731
Modern workloads are demanding increasingly larger memory capacity. Compute Express Link (CXL)-based memory tiering has emerged as a promising solution for addressing this trend by utilizing traditional DRAM alongside slow-tier CXL-memory devices in
Externí odkaz:
http://arxiv.org/abs/2312.04789
Autor:
Yang, Jiacheng, Giannoula, Christina, Wu, Jun, Elhoushi, Mostafa, Gleeson, James, Pekhimenko, Gennady
Sparse Convolution (SC) is widely used for processing 3D point clouds that are inherently sparse. Different from dense convolution, SC preserves the sparsity of the input point cloud by only allowing outputs to specific locations. To efficiently comp
Externí odkaz:
http://arxiv.org/abs/2401.06145
Static and dynamic computational graphs represent two distinct approaches to constructing deep learning frameworks. The former prioritizes compiler-based optimizations, while the latter focuses on programmability and user-friendliness. The recent rel
Externí odkaz:
http://arxiv.org/abs/2310.20078
Large Language Models (LLMs) like GPT are state-of-the-art text generation models that provide significant assistance in daily routines. However, LLM execution is inherently sequential, since they only produce one token at a time, thus incurring low
Externí odkaz:
http://arxiv.org/abs/2310.18813
Autor:
Tu, Renbo, White, Colin, Kossaifi, Jean, Bonev, Boris, Kovachki, Nikola, Pekhimenko, Gennady, Azizzadenesheli, Kamyar, Anandkumar, Anima
Neural operators, such as Fourier Neural Operators (FNO), form a principled approach for learning solution operators for PDEs and other mappings between function spaces. However, many real-world problems require high-resolution training data, and the
Externí odkaz:
http://arxiv.org/abs/2307.15034
Publikováno v:
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, 2023
Stream processing engines (SPEs) are widely used for large scale streaming analytics over unbounded time-ordered data streams. Modern day streaming analytics applications exhibit diverse compute characteristics and demand strict latency and throughpu
Externí odkaz:
http://arxiv.org/abs/2301.12030