Zobrazeno 1 - 10
of 594
pro vyhledávání: '"Yao, David"'
Autor:
Zhao, Hanyang, Winata, Genta Indra, Das, Anirban, Zhang, Shi-Xiong, Yao, David D., Tang, Wenpin, Sahu, Sambit
Recently, numerous preference optimization algorithms have been introduced as extensions to the Direct Preference Optimization (DPO) family. While these methods have successfully aligned models with human preferences, there is a lack of understanding
Externí odkaz:
http://arxiv.org/abs/2410.04203
Autor:
Winata, Genta Indra, Zhao, Hanyang, Das, Anirban, Tang, Wenpin, Yao, David D., Zhang, Shi-Xiong, Sahu, Sambit
Preference tuning is a crucial process for aligning deep generative models with human preferences. This survey offers a thorough overview of recent advancements in preference tuning and the integration of human feedback. The paper is organized into t
Externí odkaz:
http://arxiv.org/abs/2409.11564
Reinforcement Learning from human feedback (RLHF) has been shown a promising direction for aligning generative models with human intent and has also been explored in recent works for alignment of diffusion generative models. In this work, we provide
Externí odkaz:
http://arxiv.org/abs/2409.08400
Direct Preference Optimization (DPO) has recently emerged as a popular approach to improve reinforcement learning with human feedback (RLHF), leading to better techniques to fine-tune large language models (LLM). A weakness of DPO, however, lies in i
Externí odkaz:
http://arxiv.org/abs/2405.14953
Autor:
Tang, Wenpin, Yao, David D.
We study a mechanism design problem in the blockchain proof-of-stake (PoS) protocol. Our main objective is to extend the transaction fee mechanism (TFM) recently proposed in Chung and Shi (SODA, p.3856-3899, 2023), so as to incorporate a long-run uti
Externí odkaz:
http://arxiv.org/abs/2308.13881
We study reinforcement learning (RL) in the setting of continuous time and space, for an infinite horizon with a discounted objective and the underlying dynamics driven by a stochastic differential equation. Built upon recent advances in the continuo
Externí odkaz:
http://arxiv.org/abs/2305.18901
Autor:
Yao, David1 (AUTHOR) dtyao@ualberta.ca, Patel, Raj S.1 (AUTHOR) rsp@ualberta.ca, Lam, Adrien1 (AUTHOR) adrien2@ualberta.ca, Glover, Quarshie2 (AUTHOR) gq.quarshie@yahoo.com, Srinivasan, Cindy2 (AUTHOR) csriniva@ualberta.ca, Herchen, Alex2 (AUTHOR) aherchen@ualberta.ca, Ritchie, Bruce2 (AUTHOR) bruce.ritchie@ualberta.ca, Agrawal, Babita1 (AUTHOR) bagrawal@ualberta.ca
Publikováno v:
International Journal of Molecular Sciences. Sep2024, Vol. 25 Issue 18, p9814. 14p.
Autor:
Tang, Wenpin, Yao, David D.
We develop a continuous-time control approach to optimal trading in a Proof-of-Stake (PoS) blockchain, formulated as a consumption-investment problem that aims to strike the optimal balance between a participant's (or agent's) utility from holding/tr
Externí odkaz:
http://arxiv.org/abs/2207.12581
Autor:
Tang, Wenpin, Yao, David D.
We propose and study a new class of polynomial voting rules for a general decentralized decision/consensus system, and more specifically for the PoS (Proof of Stake) protocol. The main idea, inspired by the Penrose square-root law and the more recent
Externí odkaz:
http://arxiv.org/abs/2206.10105