Zobrazeno 1 - 10
of 129
pro vyhledávání: '"Tang, Wenpin"'
Autor:
Zhao, Hanyang, Winata, Genta Indra, Das, Anirban, Zhang, Shi-Xiong, Yao, David D., Tang, Wenpin, Sahu, Sambit
Recently, numerous preference optimization algorithms have been introduced as extensions to the Direct Preference Optimization (DPO) family. While these methods have successfully aligned models with human preferences, there is a lack of understanding
Externí odkaz:
http://arxiv.org/abs/2410.04203
Autor:
Winata, Genta Indra, Zhao, Hanyang, Das, Anirban, Tang, Wenpin, Yao, David D., Zhang, Shi-Xiong, Sahu, Sambit
Preference tuning is a crucial process for aligning deep generative models with human preferences. This survey offers a thorough overview of recent advancements in preference tuning and the integration of human feedback. The paper is organized into t
Externí odkaz:
http://arxiv.org/abs/2409.11564
Reinforcement Learning from human feedback (RLHF) has been shown a promising direction for aligning generative models with human intent and has also been explored in recent works for alignment of diffusion generative models. In this work, we provide
Externí odkaz:
http://arxiv.org/abs/2409.08400
Direct Preference Optimization (DPO) has recently emerged as a popular approach to improve reinforcement learning with human feedback (RLHF), leading to better techniques to fine-tune large language models (LLM). A weakness of DPO, however, lies in i
Externí odkaz:
http://arxiv.org/abs/2405.14953
Autor:
Tang, Wenpin
This paper aims to develop and provide a rigorous treatment to the problem of entropy regularized fine-tuning in the context of continuous-time diffusion models, which was recently proposed by Uehara et al. (arXiv:2402.15194, 2024). The idea is to us
Externí odkaz:
http://arxiv.org/abs/2403.06279
Autor:
Tang, Wenpin, Zhao, Hanyang
This is an expository article on the score-based diffusion models, with a particular focus on the formulation via stochastic differential equations (SDE). After a gentle introduction, we discuss the two pillars in the diffusion modeling -- sampling a
Externí odkaz:
http://arxiv.org/abs/2402.07487
Autor:
Tang, Wenpin, Zhao, Hanyang
Diffusion probabilistic models (DPMs) have emerged as a promising technique in generative modeling. The success of DPMs relies on two ingredients: time reversal of diffusion processes and score matching. In view of possibly unguaranteed score matchin
Externí odkaz:
http://arxiv.org/abs/2401.13115
Autor:
Tang, Wenpin, Yao, David D.
We study a mechanism design problem in the blockchain proof-of-stake (PoS) protocol. Our main objective is to extend the transaction fee mechanism (TFM) recently proposed in Chung and Shi (SODA, p.3856-3899, 2023), so as to incorporate a long-run uti
Externí odkaz:
http://arxiv.org/abs/2308.13881
Autor:
Tang, Wenpin
With the increasing adoption of the Proof of Stake (PoS) blockchain, it is timely to study the economy created by such blockchain. In this chapter, we will survey recent progress on the trading and wealth evolution in a cryptocurrency where the new c
Externí odkaz:
http://arxiv.org/abs/2308.01803
Autor:
Tang, Wenpin
Motivated by recent interests in predictive inference under distribution shift, we study the problem of approximating finite weighted exchangeable sequences by a mixture of finite sequences with independent terms. Various bounds are derived in terms
Externí odkaz:
http://arxiv.org/abs/2306.11584