Výsledky vyhledávání - "Zheng, Zangwei"

Report

Autor: Qin, Ziheng, Xu, Zhaopan, Zhou, Yukun, Zheng, Zangwei, Cheng, Zebang, Tang, Hao, Shang, Lei, Sun, Baigui, Peng, Xiaojiang, Timofte, Radu, Yao, Hongxun, Wang, Kai, You, Yang

Deep learning benefits from the growing abundance of available data. Meanwhile, efficiently dealing with the growing data scale has become a challenge. Data publicly available are from different sources with various qualities, and it is impractical t

Externí odkaz: http://arxiv.org/abs/2405.18347

Zobrazit plný text záznamu

Report

How Does the Textual Information Affect the Retrieval of Multimodal In-Context Learning?

Autor: Luo, Yang, Zheng, Zangwei, Zhu, Zirui, You, Yang

The increase in parameter size of multimodal large language models (MLLMs) introduces significant capabilities, particularly in-context learning, where MLLMs enhance task performance without updating pre-trained parameters. This effectiveness, howeve

Externí odkaz: http://arxiv.org/abs/2404.12866

Zobrazit plný text záznamu

Report

DSP: Dynamic Sequence Parallelism for Multi-Dimensional Transformers

Autor: Zhao, Xuanlei, Cheng, Shenggan, Chen, Chang, Zheng, Zangwei, Liu, Ziming, Yang, Zheming, You, Yang

Scaling multi-dimensional transformers to long sequences is indispensable across various domains. However, the challenges of large memory requirements and slow speeds of such sequences necessitate sequence parallelism. All existing approaches fall un

Externí odkaz: http://arxiv.org/abs/2403.10266

Zobrazit plný text záznamu

Report

Helen: Optimizing CTR Prediction Models with Frequency-wise Hessian Eigenvalue Regularization

Autor: Zhu, Zirui, Liu, Yong, Zheng, Zangwei, Guo, Huifeng, You, Yang

Click-Through Rate (CTR) prediction holds paramount significance in online advertising and recommendation scenarios. Despite the proliferation of recent CTR prediction models, the improvements in performance have remained limited, as evidenced by ope

Externí odkaz: http://arxiv.org/abs/2403.00798

Zobrazit plný text záznamu

Report

OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models

Autor: Xue, Fuzhao, Zheng, Zian, Fu, Yao, Ni, Jinjie, Zheng, Zangwei, Zhou, Wangchunshu, You, Yang

To help the open-source community have a better understanding of Mixture-of-Experts (MoE) based large language models (LLMs), we train and release OpenMoE, a series of fully open-sourced and reproducible decoder-only MoE LLMs, ranging from 650M to 34

Externí odkaz: http://arxiv.org/abs/2402.01739

Zobrazit plný text záznamu

Report

CAME: Confidence-guided Adaptive Memory Efficient Optimization

Autor: Luo, Yang, Ren, Xiaozhe, Zheng, Zangwei, Jiang, Zhuo, Jiang, Xin, You, Yang

Adaptive gradient methods, such as Adam and LAMB, have demonstrated excellent performance in the training of large language models. Nevertheless, the need for adaptivity requires maintaining second-moment estimates of the per-parameter gradients, whi

Externí odkaz: http://arxiv.org/abs/2307.02047

Zobrazit plný text záznamu

Report

To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis

Autor: Xue, Fuzhao, Fu, Yao, Zhou, Wangchunshu, Zheng, Zangwei, You, Yang

Recent research has highlighted the importance of dataset size in scaling language models. However, large language models (LLMs) are notoriously token-hungry during pre-training, and high-quality text data on the web is approaching its scaling limit

Externí odkaz: http://arxiv.org/abs/2305.13230

Zobrazit plný text záznamu

Report

Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline

Autor: Zheng, Zangwei, Ren, Xiaozhe, Xue, Fuzhao, Luo, Yang, Jiang, Xin, You, Yang

Large language models (LLMs) have revolutionized the field of AI, demonstrating unprecedented capacity across various tasks. However, the inference process for LLMs comes with significant computational costs. In this paper, we propose an efficient LL

Externí odkaz: http://arxiv.org/abs/2305.13144

Zobrazit plný text záznamu

Report

Preventing Zero-Shot Transfer Degradation in Continual Learning of Vision-Language Models

Autor: Zheng, Zangwei, Ma, Mingyuan, Wang, Kai, Qin, Ziheng, Yue, Xiangyu, You, Yang

Continual learning (CL) can help pre-trained vision-language models efficiently adapt to new or under-trained data distributions without re-training. Nevertheless, during the continual training of the Contrastive Language-Image Pre-training (CLIP) mo

Externí odkaz: http://arxiv.org/abs/2303.06628

Zobrazit plný text záznamu

Report

InfoBatch: Lossless Training Speed Up by Unbiased Dynamic Data Pruning

Autor: Qin, Ziheng, Wang, Kai, Zheng, Zangwei, Gu, Jianyang, Peng, Xiangyu, Xu, Zhaopan, Zhou, Daquan, Shang, Lei, Sun, Baigui, Xie, Xuansong, You, Yang

Data pruning aims to obtain lossless performances with less overall cost. A common approach is to filter out samples that make less contribution to the training. This could lead to gradient expectation bias compared to the original data. To solve thi

Externí odkaz: http://arxiv.org/abs/2303.04947

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání