Výsledky vyhledávání - "Zhang, Yushun"

Report

MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning

Autor: Chen, Yupeng, Wang, Senmiao, Lin, Zhihang, Qin, Zeyu, Zhang, Yushun, Ding, Tian, Sun, Ruoyu

Recently, large language models (LLMs) have demonstrated remarkable capabilities in a wide range of tasks. Typically, an LLM is pre-trained on large corpora and subsequently fine-tuned on task-specific datasets. However, during fine-tuning, LLMs may

Externí odkaz: http://arxiv.org/abs/2407.20999

Zobrazit plný text záznamu

Report

Adam-mini: Use Fewer Learning Rates To Gain More

Autor: Zhang, Yushun, Chen, Congliang, Li, Ziniu, Ding, Tian, Wu, Chenwei, Ye, Yinyu, Luo, Zhi-Quan, Sun, Ruoyu

We propose Adam-mini, an optimizer that achieves on-par or better performance than AdamW with 45% to 50% less memory footprint. Adam-mini reduces memory by cutting down the learning rate resources in Adam (i.e., $1/\sqrt{v}$). We find that $\geq$ 90%

Externí odkaz: http://arxiv.org/abs/2406.16793

Zobrazit plný text záznamu

Report

Why Transformers Need Adam: A Hessian Perspective

Autor: Zhang, Yushun, Chen, Congliang, Ding, Tian, Li, Ziniu, Sun, Ruoyu, Luo, Zhi-Quan

SGD performs worse than Adam by a significant margin on Transformers, but the reason remains unclear. In this work, we provide an explanation through the lens of Hessian: (i) Transformers are "heterogeneous": the Hessian spectrum across parameter blo

Externí odkaz: http://arxiv.org/abs/2402.16788

Zobrazit plný text záznamu

Report

ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models

Autor: Li, Ziniu, Xu, Tian, Zhang, Yushun, Lin, Zhihang, Yu, Yang, Sun, Ruoyu, Luo, Zhi-Quan

Reinforcement Learning from Human Feedback (RLHF) is key to aligning Large Language Models (LLMs), typically paired with the Proximal Policy Optimization (PPO) algorithm. While PPO is a powerful method designed for general reinforcement learning task

Externí odkaz: http://arxiv.org/abs/2310.10505

Zobrazit plný text záznamu

Report

Uncertainty and Explainable Analysis of Machine Learning Model for Reconstruction of Sonic Slowness Logs

Autor: Wang, Hua, Wu, Yuqiong, Zhang, Yushun, Lai, Fuqiang, Feng, Zhou, Xie, Bing, Zhao, Ailin

Logs are valuable information for oil and gas fields as they help to determine the lithology of the formations surrounding the borehole and the location and reserves of subsurface oil and gas reservoirs. However, important logs are often missing in h

Externí odkaz: http://arxiv.org/abs/2308.12625

Zobrazit plný text záznamu

Report

When Expressivity Meets Trainability: Fewer than $n$ Neurons Can Work

Autor: Zhang, Jiawei, Zhang, Yushun, Hong, Mingyi, Sun, Ruoyu, Luo, Zhi-Quan

Modern neural networks are often quite wide, causing large memory and computation costs. It is thus of great interest to train a narrower network. However, training narrow neural nets remains a challenging task. We ask two theoretical questions: Can

Externí odkaz: http://arxiv.org/abs/2210.12001

Zobrazit plný text záznamu

Report

Provable Adaptivity of Adam under Non-uniform Smoothness

Autor: Wang, Bohan, Zhang, Yushun, Zhang, Huishuai, Meng, Qi, Sun, Ruoyu, Ma, Zhi-Ming, Liu, Tie-Yan, Luo, Zhi-Quan, Chen, Wei

Adam is widely adopted in practical applications due to its fast convergence. However, its theoretical analysis is still far from satisfactory. Existing convergence analyses for Adam rely on the bounded smoothness assumption, referred to as the \emph

Externí odkaz: http://arxiv.org/abs/2208.09900

Zobrazit plný text záznamu

Report

Adam Can Converge Without Any Modification On Update Rules

Autor: Zhang, Yushun, Chen, Congliang, Shi, Naichen, Sun, Ruoyu, Luo, Zhi-Quan

Ever since Reddi et al. 2018 pointed out the divergence issue of Adam, many new variants have been designed to obtain convergence. However, vanilla Adam remains exceptionally popular and it works well in practice. Why is there a gap between theory an

Externí odkaz: http://arxiv.org/abs/2208.09632

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání