Výsledky vyhledávání - "Liu, Peter P"

Report

Scaling Exponents Across Parameterizations and Optimizers

Autor: Everett, Katie, Xiao, Lechao, Wortsman, Mitchell, Alemi, Alexander A., Novak, Roman, Liu, Peter J., Gur, Izzeddin, Sohl-Dickstein, Jascha, Kaelbling, Leslie Pack, Lee, Jaehoon, Pennington, Jeffrey

Robust and effective scaling of models from small to large width typically requires the precise adjustment of many algorithmic and architectural details, such as parameterization and optimizer choices. In this work, we propose a new perspective on pa

Externí odkaz: http://arxiv.org/abs/2407.05872

Zobrazit plný text záznamu

Report

LiPO: Listwise Preference Optimization through Learning-to-Rank

Autor: Liu, Tianqi, Qin, Zhen, Wu, Junru, Shen, Jiaming, Khalman, Misha, Joshi, Rishabh, Zhao, Yao, Saleh, Mohammad, Baumgartner, Simon, Liu, Jialu, Liu, Peter J., Wang, Xuanhui

Aligning language models (LMs) with curated human feedback is critical to control their behaviors in real-world applications. Several recent policy optimization methods, such as DPO and SLiC, serve as promising alternatives to the traditional Reinfor

Externí odkaz: http://arxiv.org/abs/2402.01878

Zobrazit plný text záznamu

Report

Self-Evaluation Improves Selective Generation in Large Language Models

Autor: Ren, Jie, Zhao, Yao, Vu, Tu, Liu, Peter J., Lakshminarayanan, Balaji

Safe deployment of large language models (LLMs) may benefit from a reliable method for assessing their generated content to determine when to abstain or to selectively generate. While likelihood-based metrics such as perplexity are widely employed, r

Externí odkaz: http://arxiv.org/abs/2312.09300

Zobrazit plný text záznamu

Report

Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

Fine-tuning language models~(LMs) on human-generated data remains a prevalent practice. However, the performance of such models is often limited by the quantity and diversity of high-quality human data. In this paper, we explore whether we can go bey

Externí odkaz: http://arxiv.org/abs/2312.06585

Zobrazit plný text záznamu

Report

Frontier Language Models are not Robust to Adversarial Arithmetic, or 'What do I need to say so you agree 2+2=5?

We introduce and study the problem of adversarial arithmetic, which provides a simple yet challenging testbed for language model alignment. This problem is comprised of arithmetic questions posed in natural language, with an arbitrary adversarial str

Externí odkaz: http://arxiv.org/abs/2311.07587

Zobrazit plný text záznamu

Report

Improving Large Language Model Fine-tuning for Solving Math Problems

Autor: Liu, Yixin, Singh, Avi, Freeman, C. Daniel, Co-Reyes, John D., Liu, Peter J.

Despite their success in many natural language tasks, solving math problems remains a significant challenge for large language models (LLMs). A large gap exists between LLMs' pass-at-one and pass-at-N performance in solving math problems, suggesting

Externí odkaz: http://arxiv.org/abs/2310.10047

Zobrazit plný text záznamu

Report

Small-scale proxies for large-scale Transformer training instabilities

Autor: Wortsman, Mitchell, Liu, Peter J., Xiao, Lechao, Everett, Katie, Alemi, Alex, Adlam, Ben, Co-Reyes, John D., Gur, Izzeddin, Kumar, Abhishek, Novak, Roman, Pennington, Jeffrey, Sohl-dickstein, Jascha, Xu, Kelvin, Lee, Jaehoon, Gilmer, Justin, Kornblith, Simon

Teams that have trained large Transformer-based models have reported training instabilities at large scale that did not appear when training with the same hyperparameters at smaller scales. Although the causes of such instabilities are of scientific

Externí odkaz: http://arxiv.org/abs/2309.14322

Zobrazit plný text záznamu

Report

Statistical Rejection Sampling Improves Preference Optimization

Autor: Liu, Tianqi, Zhao, Yao, Joshi, Rishabh, Khalman, Misha, Saleh, Mohammad, Liu, Peter J., Liu, Jialu

Improving the alignment of language models with human preferences remains an active research challenge. Previous approaches have primarily utilized Reinforcement Learning from Human Feedback (RLHF) via online RL methods such as Proximal Policy Optimi

Externí odkaz: http://arxiv.org/abs/2309.06657

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání