Výsledky vyhledávání - "Tang, Wenpin"

Report

RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization

Autor: Zhao, Hanyang, Winata, Genta Indra, Das, Anirban, Zhang, Shi-Xiong, Yao, David D., Tang, Wenpin, Sahu, Sambit

Recently, numerous preference optimization algorithms have been introduced as extensions to the Direct Preference Optimization (DPO) family. While these methods have successfully aligned models with human preferences, there is a lack of understanding

Externí odkaz: http://arxiv.org/abs/2410.04203

Zobrazit plný text záznamu

Report

Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey

Autor: Winata, Genta Indra, Zhao, Hanyang, Das, Anirban, Tang, Wenpin, Yao, David D., Zhang, Shi-Xiong, Sahu, Sambit

Preference tuning is a crucial process for aligning deep generative models with human preferences. This survey offers a thorough overview of recent advancements in preference tuning and the integration of human feedback. The paper is organized into t

Externí odkaz: http://arxiv.org/abs/2409.11564

Zobrazit plný text záznamu

Report

Scores as Actions: a framework of fine-tuning diffusion models by continuous-time reinforcement learning

Autor: Zhao, Hanyang, Chen, Haoxian, Zhang, Ji, Yao, David D., Tang, Wenpin

Reinforcement Learning from human feedback (RLHF) has been shown a promising direction for aligning generative models with human intent and has also been explored in recent works for alignment of diffusion generative models. In this work, we provide

Externí odkaz: http://arxiv.org/abs/2409.08400

Zobrazit plný text záznamu

Report

MallowsPO: Fine-Tune Your LLM with Preference Dispersions

Autor: Chen, Haoxian, Zhao, Hanyang, Lam, Henry, Yao, David, Tang, Wenpin

Direct Preference Optimization (DPO) has recently emerged as a popular approach to improve reinforcement learning with human feedback (RLHF), leading to better techniques to fine-tune large language models (LLM). A weakness of DPO, however, lies in i

Externí odkaz: http://arxiv.org/abs/2405.14953

Zobrazit plný text záznamu

Report

Fine-tuning of diffusion models via stochastic control: entropy regularization and beyond

Autor: Tang, Wenpin

This paper aims to develop and provide a rigorous treatment to the problem of entropy regularized fine-tuning in the context of continuous-time diffusion models, which was recently proposed by Uehara et al. (arXiv:2402.15194, 2024). The idea is to us

Externí odkaz: http://arxiv.org/abs/2403.06279

Zobrazit plný text záznamu

Report

Score-based Diffusion Models via Stochastic Differential Equations -- a Technical Tutorial

Autor: Tang, Wenpin, Zhao, Hanyang

This is an expository article on the score-based diffusion models, with a particular focus on the formulation via stochastic differential equations (SDE). After a gentle introduction, we discuss the two pillars in the diffusion modeling -- sampling a

Externí odkaz: http://arxiv.org/abs/2402.07487

Zobrazit plný text záznamu

Report

Contractive Diffusion Probabilistic Models

Autor: Tang, Wenpin, Zhao, Hanyang

Diffusion probabilistic models (DPMs) have emerged as a promising technique in generative modeling. The success of DPMs relies on two ingredients: time reversal of diffusion processes and score matching. In view of possibly unguaranteed score matchin

Externí odkaz: http://arxiv.org/abs/2401.13115

Zobrazit plný text záznamu

Report

Transaction fee mechanism for Proof-of-Stake protocol

Autor: Tang, Wenpin, Yao, David D.

We study a mechanism design problem in the blockchain proof-of-stake (PoS) protocol. Our main objective is to extend the transaction fee mechanism (TFM) recently proposed in Chung and Shi (SODA, p.3856-3899, 2023), so as to incorporate a long-run uti

Externí odkaz: http://arxiv.org/abs/2308.13881

Zobrazit plný text záznamu

Report

Trading and wealth evolution in the Proof of Stake protocol

Autor: Tang, Wenpin

With the increasing adoption of the Proof of Stake (PoS) blockchain, it is timely to study the economy created by such blockchain. In this chapter, we will survey recent progress on the trading and wealth evolution in a cryptocurrency where the new c

Externí odkaz: http://arxiv.org/abs/2308.01803

Zobrazit plný text záznamu

Report

Finite and infinite weighted exchangeable sequences

Autor: Tang, Wenpin

Motivated by recent interests in predictive inference under distribution shift, we study the problem of approximating finite weighted exchangeable sequences by a mixture of finite sequences with independent terms. Various bounds are derived in terms

Externí odkaz: http://arxiv.org/abs/2306.11584

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání