Zobrazeno 1 - 10
of 350
pro vyhledávání: '"Zhang, Guoxi"'
Autor:
Zhang, Guoxi, Duan, Jiuding
This paper addresses the cost-efficiency aspect of Reinforcement Learning from Human Feedback (RLHF). RLHF leverages datasets of human preferences over outputs of large language models (LLM) to instill human expectations into LLMs. While preference a
Externí odkaz:
http://arxiv.org/abs/2409.18417
Autor:
Chen, Yixin, Zhang, Guoxi, Zhang, Yaowei, Xu, Hongming, Zhi, Peiyuan, Li, Qing, Huang, Siyuan
Recently, large language models (LLMs) have shown strong potential in facilitating human-robotic interaction and collaboration. However, existing LLM-based systems often overlook the misalignment between human and robot perceptions, which hinders the
Externí odkaz:
http://arxiv.org/abs/2409.15684
Neuro-symbolic reinforcement learning (NS-RL) has emerged as a promising paradigm for explainable decision-making, characterized by the interpretability of symbolic policies. NS-RL entails structured state representations for tasks with visual observ
Externí odkaz:
http://arxiv.org/abs/2403.12451
In preference-based reinforcement learning (PbRL), a reward function is learned from a type of human feedback called preference. To expedite preference collection, recent works have leveraged \emph{offline preferences}, which are preferences collecte
Externí odkaz:
http://arxiv.org/abs/2403.10160
Publikováno v:
September 2023, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
Treatment effect estimation can assist in effective decision-making in e-commerce, medicine, and education. One popular application of this estimation lies in the prediction of the impact of a treatment (e.g., a promotion) on an outcome (e.g., sales)
Externí odkaz:
http://arxiv.org/abs/2309.13884
An ultimate goal of recommender systems (RS) is to improve user engagement. Reinforcement learning (RL) is a promising paradigm for this goal, as it directly optimizes overall performance of sequential recommendation. However, many existing RL-based
Externí odkaz:
http://arxiv.org/abs/2302.06101
Autor:
Zhang, Guoxi, Kashima, Hisashi
Offline reinforcement learning (RL) have received rising interest due to its appealing data efficiency. The present study addresses behavior estimation, a task that lays the foundation of many offline RL algorithms. Behavior estimation aims at estima
Externí odkaz:
http://arxiv.org/abs/2211.16078
Autor:
Zhang, Guoxi, Kashima, Hisashi
A shortcoming of batch reinforcement learning is its requirement for rewards in data, thus not applicable to tasks without reward functions. Existing settings for lack of reward, such as behavioral cloning, rely on optimal demonstrations collected fr
Externí odkaz:
http://arxiv.org/abs/2111.04279
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.