Zobrazeno 1 - 10
of 62
pro vyhledávání: '"Zhang Qingru"'
Autor:
Zhang, Qingru, Yu, Xiaodong, Singh, Chandan, Liu, Xiaodong, Liu, Liyuan, Gao, Jianfeng, Zhao, Tuo, Roth, Dan, Cheng, Hao
Large language models (LLMs) have demonstrated remarkable performance across various real-world tasks. However, they often struggle to fully comprehend and effectively utilize their input contexts, resulting in responses that are unfaithful or halluc
Externí odkaz:
http://arxiv.org/abs/2409.10790
Autor:
Bukharin, Alexander, Hong, Ilgee, Jiang, Haoming, Li, Zichong, Zhang, Qingru, Zhang, Zixuan, Zhao, Tuo
Reinforcement learning from human feedback (RLHF) provides a principled framework for aligning AI systems with human preference data. For various reasons, e.g., personal bias, context ambiguity, lack of training, etc, human annotators may give incorr
Externí odkaz:
http://arxiv.org/abs/2406.15568
Autor:
Kang, Hao, Zhang, Qingru, Kundu, Souvik, Jeong, Geonhwa, Liu, Zaoxing, Krishna, Tushar, Zhao, Tuo
Key-value (KV) caching has become the de-facto to accelerate generation speed for large language models (LLMs) inference. However, the growing cache demand with increasing sequence length has transformed LLM inference to be a memory bound problem, si
Externí odkaz:
http://arxiv.org/abs/2403.05527
In human-written articles, we often leverage the subtleties of text style, such as bold and italics, to guide the attention of readers. These textual emphases are vital for the readers to grasp the conveyed information. When interacting with large la
Externí odkaz:
http://arxiv.org/abs/2311.02262
Pretrained transformer models have demonstrated remarkable performance across various natural language processing tasks. These models leverage the attention mechanism to capture long- and short-range dependencies in the sequence. However, the (full)
Externí odkaz:
http://arxiv.org/abs/2310.12442
Autor:
Bukharin, Alexander, Li, Yan, Yu, Yue, Zhang, Qingru, Chen, Zhehui, Zuo, Simiao, Zhang, Chao, Zhang, Songan, Zhao, Tuo
Multi-Agent Reinforcement Learning (MARL) has shown promising results across several domains. Despite this promise, MARL policies often lack robustness and are therefore sensitive to small changes in their environment. This presents a serious concern
Externí odkaz:
http://arxiv.org/abs/2310.10810
LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation
Transformer models have achieved remarkable results in various natural language tasks, but they are often prohibitively large, requiring massive memories and computational resources. To reduce the size and complexity of these models, we propose LoSpa
Externí odkaz:
http://arxiv.org/abs/2306.11222
Autor:
Zhang, Qingru, Chen, Minshuo, Bukharin, Alexander, Karampatziakis, Nikos, He, Pengcheng, Cheng, Yu, Chen, Weizhu, Zhao, Tuo
Fine-tuning large pre-trained language models on downstream tasks has become an important paradigm in NLP. However, common practice fine-tunes all of the parameters in a pre-trained model, which becomes prohibitive when a large number of downstream t
Externí odkaz:
http://arxiv.org/abs/2303.10512
Layer-wise distillation is a powerful tool to compress large models (i.e. teacher models) into small ones (i.e., student models). The student distills knowledge from the teacher by mimicking the hidden representations of the teacher at every intermed
Externí odkaz:
http://arxiv.org/abs/2210.01351
Autor:
Zhang, Qingru, Zuo, Simiao, Liang, Chen, Bukharin, Alexander, He, Pengcheng, Chen, Weizhu, Zhao, Tuo
Large Transformer-based models have exhibited superior performance in various natural language processing and computer vision tasks. However, these models contain enormous amounts of parameters, which restrict their deployment to real-world applicati
Externí odkaz:
http://arxiv.org/abs/2206.12562