Zobrazeno 1 - 10
of 42
pro vyhledávání: '"Zeng, ZhanPeng"'
Autor:
Zhong, Yunshan, Zhou, Yuyao, Zhang, Yuxin, Li, Shen, Li, Yong, Chao, Fei, Zeng, Zhanpeng, Ji, Rongrong
Data-free quantization (DFQ), which facilitates model quantization without real data to address increasing concerns about data security, has garnered significant attention within the model compression community. Recently, the unique architecture of v
Externí odkaz:
http://arxiv.org/abs/2412.16553
This paper presents innovative enhancements to diffusion models by integrating a novel multi-resolution network and time-dependent layer normalization. Diffusion models have gained prominence for their effectiveness in high-fidelity image generation.
Externí odkaz:
http://arxiv.org/abs/2406.09416
GEneral Matrix Multiply (GEMM) is a central operation in deep learning and corresponds to the largest chunk of the compute footprint. Therefore, improving its efficiency is an active topic of ongoing research. A popular strategy is the use of low bit
Externí odkaz:
http://arxiv.org/abs/2403.07339
While GPU clusters are the de facto choice for training large deep neural network (DNN) models today, several reasons including ease of workflow, security and cost have led to efforts investigating whether CPUs may be viable for inference in routine
Externí odkaz:
http://arxiv.org/abs/2403.07221
Transformers are the backbone of powerful foundation models for many Vision and Natural Language Processing tasks. But their compute and memory/storage footprint is large, and so, serving such models is expensive often requiring high-end hardware. To
Externí odkaz:
http://arxiv.org/abs/2403.06082
Autor:
Zeng, Zhanpeng, Hawkins, Cole, Hong, Mingyi, Zhang, Aston, Pappas, Nikolaos, Singh, Vikas, Zheng, Shuai
Transformers are central in modern natural language processing and computer vision applications. Despite recent works devoted to reducing the quadratic cost of such models (as a function of the sequence length), dealing with ultra long sequences (e.g
Externí odkaz:
http://arxiv.org/abs/2305.04241
Transformers have emerged as a preferred model for many tasks in natural langugage processing and vision. Recent efforts on training and deploying Transformers more efficiently have identified many strategies to approximate the self-attention matrix,
Externí odkaz:
http://arxiv.org/abs/2207.10284
Autor:
Li, Shuyuan a, 1, Li, Shuying c, 1, Yang, Dawen a, Zhang, Jingtao b, Wang, Songyang b, Zeng, Zhanpeng a, Cai, Qunbin a, Zhou, Qishi a, ⁎
Publikováno v:
In Bone March 2025 192
Autor:
Zeng, Zhanpeng, Xiong, Yunyang, Ravi, Sathya N., Acharya, Shailesh, Fung, Glenn, Singh, Vikas
Transformer-based models are widely used in natural language processing (NLP). Central to the transformer model is the self-attention mechanism, which captures the interactions of token pairs in the input sequences and depends quadratically on the se
Externí odkaz:
http://arxiv.org/abs/2111.09714
Autor:
Xiong, Yunyang, Zeng, Zhanpeng, Chakraborty, Rudrasis, Tan, Mingxing, Fung, Glenn, Li, Yin, Singh, Vikas
Transformers have emerged as a powerful tool for a broad range of natural language processing tasks. A key component that drives the impressive performance of Transformers is the self-attention mechanism that encodes the influence or dependence of ot
Externí odkaz:
http://arxiv.org/abs/2102.03902