Zobrazeno 1 - 5
of 5
pro vyhledávání: '"Zhang, Chengruidong"'
Autor:
Liu, Di, Chen, Meng, Lu, Baotong, Jiang, Huiqiang, Han, Zhenhua, Zhang, Qianxi, Chen, Qi, Zhang, Chengruidong, Ding, Bailu, Zhang, Kai, Chen, Chen, Yang, Fan, Yang, Yuqing, Qiu, Lili
Transformer-based Large Language Models (LLMs) have become increasingly important. However, due to the quadratic time complexity of attention computation, scaling LLMs to longer contexts incurs extremely slow inference latency and high GPU memory con
Externí odkaz:
http://arxiv.org/abs/2409.10516
Autor:
Jiang, Huiqiang, Li, Yucheng, Zhang, Chengruidong, Wu, Qianhui, Luo, Xufang, Ahn, Surin, Han, Zhenhua, Abdi, Amir H., Li, Dongsheng, Lin, Chin-Yew, Yang, Yuqing, Qiu, Lili
The computational challenges of Large Language Model (LLM) inference remain a significant barrier to their widespread deployment, especially as prompt lengths continue to increase. Due to the quadratic complexity of the attention computation, it take
Externí odkaz:
http://arxiv.org/abs/2407.02490
Autor:
Lin, Chaofan, Han, Zhenhua, Zhang, Chengruidong, Yang, Yuqing, Yang, Fan, Chen, Chen, Qiu, Lili
The rise of large language models (LLMs) has enabled LLM-based applications (a.k.a. AI agents or co-pilots), a new software paradigm that combines the strength of LLM and conventional software. Diverse LLM applications from different tenants could de
Externí odkaz:
http://arxiv.org/abs/2405.19888
Autor:
Ding, Yiran, Zhang, Li Lyna, Zhang, Chengruidong, Xu, Yuanyuan, Shang, Ning, Xu, Jiahang, Yang, Fan, Yang, Mao
Large context window is a desirable feature in large language models (LLMs). However, due to high fine-tuning costs, scarcity of long texts, and catastrophic values introduced by new token positions, current extended context windows are limited to ar
Externí odkaz:
http://arxiv.org/abs/2402.13753
Autor:
Zheng, Ningxin, Jiang, Huiqiang, Zhang, Quanlu, Han, Zhenhua, Yang, Yuqing, Ma, Lingxiao, Yang, Fan, Zhang, Chengruidong, Qiu, Lili, Yang, Mao, Zhou, Lidong
Dynamic sparsity, where the sparsity patterns are unknown until runtime, poses a significant challenge to deep learning. The state-of-the-art sparsity-aware deep learning solutions are restricted to pre-defined, static sparsity patterns due to signif
Externí odkaz:
http://arxiv.org/abs/2301.10936