Výsledky vyhledávání - "Zhang, Chengruidong"

Report

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval

Autor: Liu, Di, Chen, Meng, Lu, Baotong, Jiang, Huiqiang, Han, Zhenhua, Zhang, Qianxi, Chen, Qi, Zhang, Chengruidong, Ding, Bailu, Zhang, Kai, Chen, Chen, Yang, Fan, Yang, Yuqing, Qiu, Lili

Transformer-based Large Language Models (LLMs) have become increasingly important. However, due to the quadratic time complexity of attention computation, scaling LLMs to longer contexts incurs extremely slow inference latency and high GPU memory con

Externí odkaz: http://arxiv.org/abs/2409.10516

Zobrazit plný text záznamu

Report

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

Autor: Jiang, Huiqiang, Li, Yucheng, Zhang, Chengruidong, Wu, Qianhui, Luo, Xufang, Ahn, Surin, Han, Zhenhua, Abdi, Amir H., Li, Dongsheng, Lin, Chin-Yew, Yang, Yuqing, Qiu, Lili

The computational challenges of Large Language Model (LLM) inference remain a significant barrier to their widespread deployment, especially as prompt lengths continue to increase. Due to the quadratic complexity of the attention computation, it take

Externí odkaz: http://arxiv.org/abs/2407.02490

Zobrazit plný text záznamu

Report

Parrot: Efficient Serving of LLM-based Applications with Semantic Variable

Autor: Lin, Chaofan, Han, Zhenhua, Zhang, Chengruidong, Yang, Yuqing, Yang, Fan, Chen, Chen, Qiu, Lili

The rise of large language models (LLMs) has enabled LLM-based applications (a.k.a. AI agents or co-pilots), a new software paradigm that combines the strength of LLM and conventional software. Diverse LLM applications from different tenants could de

Externí odkaz: http://arxiv.org/abs/2405.19888

Zobrazit plný text záznamu

Report

LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Autor: Ding, Yiran, Zhang, Li Lyna, Zhang, Chengruidong, Xu, Yuanyuan, Shang, Ning, Xu, Jiahang, Yang, Fan, Yang, Mao

Large context window is a desirable feature in large language models (LLMs). However, due to high fine-tuning costs, scarcity of long texts, and catastrophic values introduced by new token positions, current extended context windows are limited to ar

Externí odkaz: http://arxiv.org/abs/2402.13753

Zobrazit plný text záznamu

Report

PIT: Optimization of Dynamic Sparse Deep Learning Models via Permutation Invariant Transformation

Autor: Zheng, Ningxin, Jiang, Huiqiang, Zhang, Quanlu, Han, Zhenhua, Yang, Yuqing, Ma, Lingxiao, Yang, Fan, Zhang, Chengruidong, Qiu, Lili, Yang, Mao, Zhou, Lidong

Dynamic sparsity, where the sparsity patterns are unknown until runtime, poses a significant challenge to deep learning. The state-of-the-art sparsity-aware deep learning solutions are restricted to pre-defined, static sparsity patterns due to signif

Externí odkaz: http://arxiv.org/abs/2301.10936

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání