Výsledky vyhledávání - "Abdelfattah, Mohamed"

Report

BBS: Bi-directional Bit-level Sparsity for Deep Learning Acceleration

Autor: Chen, Yuzong, Meng, Jian, Seo, Jae-sun, Abdelfattah, Mohamed S.

Bit-level sparsity methods skip ineffectual zero-bit operations and are typically applicable within bit-serial deep learning accelerators. This type of sparsity at the bit-level is especially interesting because it is both orthogonal and compatible w

Externí odkaz: http://arxiv.org/abs/2409.05227

Zobrazit plný text záznamu

Report

Palu: Compressing KV-Cache with Low-Rank Projection

Autor: Chang, Chi-Chih, Lin, Wei-Cheng, Lin, Chien-Yu, Chen, Chong-Yan, Hu, Yu-Fang, Wang, Pei-Shuo, Huang, Ning-Chi, Ceze, Luis, Abdelfattah, Mohamed S., Wu, Kai-Chiang

Post-training KV-Cache compression methods typically either sample a subset of effectual tokens or quantize the data into lower numerical bit width. However, these methods cannot exploit redundancy in the hidden dimension of the KV tensors. This pape

Externí odkaz: http://arxiv.org/abs/2407.21118

Zobrazit plný text záznamu

Report

Kratos: An FPGA Benchmark for Unrolled DNNs with Fine-Grained Sparsity and Mixed Precision

Autor: Dai, Xilai, Chen, Yuzong, Abdelfattah, Mohamed S.

FPGAs offer a flexible platform for accelerating deep neural network (DNN) inference, particularly for non-uniform workloads featuring fine-grained unstructured sparsity and mixed arithmetic precision. To leverage these redundancies, an emerging appr

Externí odkaz: http://arxiv.org/abs/2407.06033

Zobrazit plný text záznamu

Report

ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models

Autor: Akhauri, Yash, AbouElhamayed, Ahmed F, Dotzel, Jordan, Zhang, Zhiru, Rush, Alexander M, Huda, Safeen, Abdelfattah, Mohamed S

The high power consumption and latency-sensitive deployments of large language models (LLMs) have motivated efficiency techniques like quantization and sparsity. Contextual sparsity, where the sparsity pattern is input-dependent, is crucial in LLMs b

Externí odkaz: http://arxiv.org/abs/2406.16635

Zobrazit plný text záznamu

Report

Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs

Autor: Dotzel, Jordan, Chen, Yuzong, Kotb, Bahaa, Prasad, Sushma, Wu, Gang, Li, Sheng, Abdelfattah, Mohamed S., Zhang, Zhiru

The increasing size of large language models (LLMs) traditionally requires low-precision integer formats to meet strict latency and power demands. Yet recently, alternative formats such as Normal Float (NF4) have increased model accuracy at the cost

Externí odkaz: http://arxiv.org/abs/2405.03103

Zobrazit plný text záznamu

Report

Radial Networks: Dynamic Layer Routing for High-Performance Large Language Models

Autor: Dotzel, Jordan, Akhauri, Yash, AbouElhamayed, Ahmed S., Jiang, Carly, Abdelfattah, Mohamed, Zhang, Zhiru

Large language models (LLMs) often struggle with strict memory, latency, and power demands. To meet these demands, various forms of dynamic sparsity have been proposed that reduce compute on an input-by-input basis. These methods improve over static

Externí odkaz: http://arxiv.org/abs/2404.04900

Zobrazit plný text záznamu

Report

Encodings for Prediction-based Neural Architecture Search

Autor: Akhauri, Yash, Abdelfattah, Mohamed S.

Predictor-based methods have substantially enhanced Neural Architecture Search (NAS) optimization. The efficacy of these predictors is largely influenced by the method of encoding neural network architectures. While traditional encodings used an adja

Externí odkaz: http://arxiv.org/abs/2403.02484

Zobrazit plný text záznamu

Report

On Latency Predictors for Neural Architecture Search

Autor: Akhauri, Yash, Abdelfattah, Mohamed S.

Efficient deployment of neural networks (NN) requires the co-optimization of accuracy and latency. For example, hardware-aware neural architecture search has been used to automatically find NN architectures that satisfy a latency constraint on a spec

Externí odkaz: http://arxiv.org/abs/2403.02446

Zobrazit plný text záznamu

Report

Beyond Inference: Performance Analysis of DNN Server Overheads for Computer Vision

Autor: AbouElhamayed, Ahmed F., Balle, Susanne, Singh, Deshanand, Abdelfattah, Mohamed S.

Deep neural network (DNN) inference has become an important part of many data-center workloads. This has prompted focused efforts to design ever-faster deep learning accelerators such as GPUs and TPUs. However, an end-to-end DNN-based vision applicat

Externí odkaz: http://arxiv.org/abs/2403.12981

Zobrazit plný text záznamu

Report

Exploring the Limits of Semantic Image Compression at Micro-bits per Pixel

Autor: Dotzel, Jordan, Kotb, Bahaa, Dotzel, James, Abdelfattah, Mohamed, Zhang, Zhiru

Traditional methods, such as JPEG, perform image compression by operating on structural information, such as pixel values or frequency content. These methods are effective to bitrates around one bit per pixel (bpp) and higher at standard image sizes.

Externí odkaz: http://arxiv.org/abs/2402.13536

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání