Zobrazeno 1 - 10
of 39
pro vyhledávání: '"Zeng, Siliang"'
Autor:
Cen, Zhepeng, Liu, Yao, Zeng, Siliang, Chaudhar, Pratik, Rangwala, Huzefa, Karypis, George, Fakoor, Rasool
Language models are often trained to maximize the likelihood of the next token given past tokens in the training dataset. However, during inference time, they are utilized differently, generating text sequentially and auto-regressively by using previ
Externí odkaz:
http://arxiv.org/abs/2410.14655
Autor:
Li, Chenliang, Zeng, Siliang, Liao, Zeyi, Li, Jiaxiang, Kang, Dongyeop, Garcia, Alfredo, Hong, Mingyi
Aligning human preference and value is an important requirement for building contemporary foundation models and embodied AI. However, popular approaches such as reinforcement learning with human feedback (RLHF) break down the task into successive sta
Externí odkaz:
http://arxiv.org/abs/2406.06874
Aligning human preference and value is an important requirement for contemporary foundation models. State-of-the-art techniques such as Reinforcement Learning from Human Feedback (RLHF) often consist of two stages: 1) supervised fine-tuning (SFT), wh
Externí odkaz:
http://arxiv.org/abs/2405.17888
We consider a Bayesian approach to offline model-based inverse reinforcement learning (IRL). The proposed framework differs from existing offline model-based IRL approaches by performing simultaneous estimation of the expert's reward function and sub
Externí odkaz:
http://arxiv.org/abs/2309.08571
Offline inverse reinforcement learning (Offline IRL) aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent. Accurate models of expertise in execu
Externí odkaz:
http://arxiv.org/abs/2302.07457
Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy that best fits observed sequences of states and actions implemented by an expert. Many algorithms for IRL have an inherently nested structure:
Externí odkaz:
http://arxiv.org/abs/2210.01808
We consider the task of estimating a structural model of dynamic decisions by a human agent based upon the observable history of implemented actions and visited states. This problem has an inherent nested structure: in the inner problem, an optimal p
Externí odkaz:
http://arxiv.org/abs/2210.01282
Multi-agent reinforcement learning (MARL) has attracted much research attention recently. However, unlike its single-agent counterpart, many theoretical and algorithmic aspects of MARL have not been well-understood. In this paper, we study the emerge
Externí odkaz:
http://arxiv.org/abs/2110.05597
This paper proposes a new algorithm -- the \underline{S}ingle-timescale Do\underline{u}ble-momentum \underline{St}ochastic \underline{A}pprox\underline{i}matio\underline{n} (SUSTAIN) -- for tackling stochastic unconstrained bilevel optimization probl
Externí odkaz:
http://arxiv.org/abs/2102.07367
We study a generic class of decentralized algorithms in which $N$ agents jointly optimize the non-convex objective $f(u):=1/N\sum_{i=1}^{N}f_i(u)$, while only communicating with their neighbors. This class of problems has become popular in modeling m
Externí odkaz:
http://arxiv.org/abs/2006.11662