Výsledky vyhledávání - "Abdelfattah, Mohamed S."

Report

BBS: Bi-directional Bit-level Sparsity for Deep Learning Acceleration

Autor: Chen, Yuzong, Meng, Jian, Seo, Jae-sun, Abdelfattah, Mohamed S.

Bit-level sparsity methods skip ineffectual zero-bit operations and are typically applicable within bit-serial deep learning accelerators. This type of sparsity at the bit-level is especially interesting because it is both orthogonal and compatible w

Externí odkaz: http://arxiv.org/abs/2409.05227

Zobrazit plný text záznamu

Report

Kratos: An FPGA Benchmark for Unrolled DNNs with Fine-Grained Sparsity and Mixed Precision

Autor: Dai, Xilai, Chen, Yuzong, Abdelfattah, Mohamed S.

FPGAs offer a flexible platform for accelerating deep neural network (DNN) inference, particularly for non-uniform workloads featuring fine-grained unstructured sparsity and mixed arithmetic precision. To leverage these redundancies, an emerging appr

Externí odkaz: http://arxiv.org/abs/2407.06033

Zobrazit plný text záznamu

Report

ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models

Autor: Akhauri, Yash, AbouElhamayed, Ahmed F, Dotzel, Jordan, Zhang, Zhiru, Rush, Alexander M, Huda, Safeen, Abdelfattah, Mohamed S

The high power consumption and latency-sensitive deployments of large language models (LLMs) have motivated efficiency techniques like quantization and sparsity. Contextual sparsity, where the sparsity pattern is input-dependent, is crucial in LLMs b

Externí odkaz: http://arxiv.org/abs/2406.16635

Zobrazit plný text záznamu

Report

Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs

Autor: Dotzel, Jordan, Chen, Yuzong, Kotb, Bahaa, Prasad, Sushma, Wu, Gang, Li, Sheng, Abdelfattah, Mohamed S., Zhang, Zhiru

The increasing size of large language models (LLMs) traditionally requires low-precision integer formats to meet strict latency and power demands. Yet recently, alternative formats such as Normal Float (NF4) have increased model accuracy at the cost

Externí odkaz: http://arxiv.org/abs/2405.03103

Zobrazit plný text záznamu

Report

Encodings for Prediction-based Neural Architecture Search

Autor: Akhauri, Yash, Abdelfattah, Mohamed S.

Predictor-based methods have substantially enhanced Neural Architecture Search (NAS) optimization. The efficacy of these predictors is largely influenced by the method of encoding neural network architectures. While traditional encodings used an adja

Externí odkaz: http://arxiv.org/abs/2403.02484

Zobrazit plný text záznamu

Report

On Latency Predictors for Neural Architecture Search

Autor: Akhauri, Yash, Abdelfattah, Mohamed S.

Efficient deployment of neural networks (NN) requires the co-optimization of accuracy and latency. For example, hardware-aware neural architecture search has been used to automatically find NN architectures that satisfy a latency constraint on a spec

Externí odkaz: http://arxiv.org/abs/2403.02446

Zobrazit plný text záznamu

Report

Beyond Inference: Performance Analysis of DNN Server Overheads for Computer Vision

Autor: AbouElhamayed, Ahmed F., Balle, Susanne, Singh, Deshanand, Abdelfattah, Mohamed S.

Deep neural network (DNN) inference has become an important part of many data-center workloads. This has prompted focused efforts to design ever-faster deep learning accelerators such as GPUs and TPUs. However, an end-to-end DNN-based vision applicat

Externí odkaz: http://arxiv.org/abs/2403.12981

Zobrazit plný text záznamu

Report

Fast Sampling Through The Reuse Of Attention Maps In Diffusion Models

Autor: Hunter, Rosco, Dudziak, Łukasz, Abdelfattah, Mohamed S., Mehrotra, Abhinav, Bhattacharya, Sourav, Wen, Hongkai

Text-to-image diffusion models have demonstrated unprecedented capabilities for flexible and realistic image synthesis. Nevertheless, these models rely on a time-consuming sampling procedure, which has motivated attempts to reduce their latency. When

Externí odkaz: http://arxiv.org/abs/2401.01008

Zobrazit plný text záznamu

Report

M4BRAM: Mixed-Precision Matrix-Matrix Multiplication in FPGA Block RAMs

Autor: Chen, Yuzong, Dotzel, Jordan, Abdelfattah, Mohamed S.

Mixed-precision quantization is a popular approach for compressing deep neural networks (DNNs). However, it is challenging to scale the performance efficiently with mixed-precision DNNs given the current FPGA architecture and conventional accelerator

Externí odkaz: http://arxiv.org/abs/2311.02758

Zobrazit plný text záznamu

Report

FLIQS: One-Shot Mixed-Precision Floating-Point and Integer Quantization Search

Autor: Dotzel, Jordan, Wu, Gang, Li, Andrew, Umar, Muhammad, Ni, Yun, Abdelfattah, Mohamed S., Zhang, Zhiru, Cheng, Liqun, Dixon, Martin G., Jouppi, Norman P., Le, Quoc V., Li, Sheng

Quantization has become a mainstream compression technique for reducing model size, computational requirements, and energy consumption for modern deep neural networks (DNNs). With improved numerical support in recent hardware, including multiple vari

Externí odkaz: http://arxiv.org/abs/2308.03290

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání