Výsledky vyhledávání - "Wang Zhongfeng"

Report

Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format

Autor: Fang, Chao, Shi, Man, Geens, Robin, Symons, Arne, Wang, Zhongfeng, Verhelst, Marian

The widely-used, weight-only quantized large language models (LLMs), which leverage low-bit integer (INT) weights and retain floating-point (FP) activations, reduce storage requirements while maintaining accuracy. However, this shifts the energy and

Externí odkaz: http://arxiv.org/abs/2411.15982

Zobrazit plný text záznamu

Report

TaQ-DiT: Time-aware Quantization for Diffusion Transformers

Autor: Liu, Xinyan, Shi, Huihong, Xu, Yang, Wang, Zhongfeng

Transformer-based diffusion models, dubbed Diffusion Transformers (DiTs), have achieved state-of-the-art performance in image and video generation tasks. However, their large model size and slow inference speed limit their practical applications, cal

Externí odkaz: http://arxiv.org/abs/2411.14172

Zobrazit plný text záznamu

Report

M$^2$-ViT: Accelerating Hybrid Vision Transformers with Two-Level Mixed Quantization

Autor: Liang, Yanbiao, Shi, Huihong, Wang, Zhongfeng

Although Vision Transformers (ViTs) have achieved significant success, their intensive computations and substantial memory overheads challenge their deployment on edge devices. To address this, efficient ViTs have emerged, typically featuring Convolu

Externí odkaz: http://arxiv.org/abs/2410.09113

Zobrazit plný text záznamu

Report

Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores

Autor: Ma, Shaobo, Fang, Chao, Shao, Haikuo, Wang, Zhongfeng

Large language models (LLMs) have been widely applied but face challenges in efficient inference. While quantization methods reduce computational demands, ultra-low bit quantization with arbitrary precision is hindered by limited GPU Tensor Core supp

Externí odkaz: http://arxiv.org/abs/2409.17870

Zobrazit plný text záznamu

Report

SPEED: A Scalable RISC-V Vector Processor Enabling Efficient Multi-Precision DNN Inference

Autor: Wang, Chuanning, Fang, Chao, Wu, Xiao, Wang, Zhongfeng, Lin, Jun

Deploying deep neural networks (DNNs) on those resource-constrained edge platforms is hindered by their substantial computation and storage demands. Quantized multi-precision DNNs, denoted as MP-DNNs, offer a promising solution for these limitations

Externí odkaz: http://arxiv.org/abs/2409.14017

Zobrazit plný text záznamu

Report

A High-Throughput Hardware Accelerator for Lempel-Ziv 4 Compression Algorithm

Autor: Chen, Tao, Song, Suwen, Wang, Zhongfeng

This paper delves into recent hardware implementations of the Lempel-Ziv 4 (LZ4) algorithm, highlighting two key factors that limit the throughput of single-kernel compressors. Firstly, the actual parallelism exhibited in single-kernel designs falls

Externí odkaz: http://arxiv.org/abs/2409.12433

Zobrazit plný text záznamu

Report

NASH: Neural Architecture and Accelerator Search for Multiplication-Reduced Hybrid Models

Autor: Xu, Yang, Shi, Huihong, Wang, Zhongfeng

The significant computational cost of multiplications hinders the deployment of deep neural networks (DNNs) on edge devices. While multiplication-free models offer enhanced hardware efficiency, they typically sacrifice accuracy. As a solution, multip

Externí odkaz: http://arxiv.org/abs/2409.04829

Zobrazit plný text záznamu

Report

Co-Designing Binarized Transformer and Hardware Accelerator for Efficient End-to-End Edge Deployment

Autor: Ji, Yuhao, Fang, Chao, Ma, Shaobo, Shao, Haikuo, Wang, Zhongfeng

Transformer models have revolutionized AI tasks, but their large size hinders real-world deployment on resource-constrained and latency-critical edge devices. While binarized Transformers offer a promising solution by significantly reducing model siz

Externí odkaz: http://arxiv.org/abs/2407.12070

Zobrazit plný text záznamu

Report

P$^2$-ViT: Power-of-Two Post-Training Quantization and Acceleration for Fully Quantized Vision Transformer

Autor: Shi, Huihong, Cheng, Xin, Mao, Wendong, Wang, Zhongfeng

Vision Transformers (ViTs) have excelled in computer vision tasks but are memory-consuming and computation-intensive, challenging their deployment on resource-constrained devices. To tackle this limitation, prior works have explored ViT-tailored quan

Externí odkaz: http://arxiv.org/abs/2405.19915

Zobrazit plný text záznamu

Report

Trio-ViT: Post-Training Quantization and Acceleration for Softmax-Free Efficient Vision Transformer

Autor: Shi, Huihong, Shao, Haikuo, Mao, Wendong, Wang, Zhongfeng

Motivated by the huge success of Transformers in the field of natural language processing (NLP), Vision Transformers (ViTs) have been rapidly developed and achieved remarkable performance in various computer vision tasks. However, their huge model si

Externí odkaz: http://arxiv.org/abs/2405.03882

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání