Zobrazeno 1 - 10
of 862
pro vyhledávání: '"Wang Zhongfeng"'
The widely-used, weight-only quantized large language models (LLMs), which leverage low-bit integer (INT) weights and retain floating-point (FP) activations, reduce storage requirements while maintaining accuracy. However, this shifts the energy and
Externí odkaz:
http://arxiv.org/abs/2411.15982
Transformer-based diffusion models, dubbed Diffusion Transformers (DiTs), have achieved state-of-the-art performance in image and video generation tasks. However, their large model size and slow inference speed limit their practical applications, cal
Externí odkaz:
http://arxiv.org/abs/2411.14172
Although Vision Transformers (ViTs) have achieved significant success, their intensive computations and substantial memory overheads challenge their deployment on edge devices. To address this, efficient ViTs have emerged, typically featuring Convolu
Externí odkaz:
http://arxiv.org/abs/2410.09113
Large language models (LLMs) have been widely applied but face challenges in efficient inference. While quantization methods reduce computational demands, ultra-low bit quantization with arbitrary precision is hindered by limited GPU Tensor Core supp
Externí odkaz:
http://arxiv.org/abs/2409.17870
Deploying deep neural networks (DNNs) on those resource-constrained edge platforms is hindered by their substantial computation and storage demands. Quantized multi-precision DNNs, denoted as MP-DNNs, offer a promising solution for these limitations
Externí odkaz:
http://arxiv.org/abs/2409.14017
This paper delves into recent hardware implementations of the Lempel-Ziv 4 (LZ4) algorithm, highlighting two key factors that limit the throughput of single-kernel compressors. Firstly, the actual parallelism exhibited in single-kernel designs falls
Externí odkaz:
http://arxiv.org/abs/2409.12433
The significant computational cost of multiplications hinders the deployment of deep neural networks (DNNs) on edge devices. While multiplication-free models offer enhanced hardware efficiency, they typically sacrifice accuracy. As a solution, multip
Externí odkaz:
http://arxiv.org/abs/2409.04829
Co-Designing Binarized Transformer and Hardware Accelerator for Efficient End-to-End Edge Deployment
Transformer models have revolutionized AI tasks, but their large size hinders real-world deployment on resource-constrained and latency-critical edge devices. While binarized Transformers offer a promising solution by significantly reducing model siz
Externí odkaz:
http://arxiv.org/abs/2407.12070
Vision Transformers (ViTs) have excelled in computer vision tasks but are memory-consuming and computation-intensive, challenging their deployment on resource-constrained devices. To tackle this limitation, prior works have explored ViT-tailored quan
Externí odkaz:
http://arxiv.org/abs/2405.19915
Motivated by the huge success of Transformers in the field of natural language processing (NLP), Vision Transformers (ViTs) have been rapidly developed and achieved remarkable performance in various computer vision tasks. However, their huge model si
Externí odkaz:
http://arxiv.org/abs/2405.03882