Výsledky vyhledávání - "Zhang, Guoxi"

Report

VickreyFeedback: Cost-efficient Data Construction for Reinforcement Learning from Human Feedback

Autor: Zhang, Guoxi, Duan, Jiuding

This paper addresses the cost-efficiency aspect of Reinforcement Learning from Human Feedback (RLHF). RLHF leverages datasets of human preferences over outputs of large language models (LLM) to instill human expectations into LLMs. While preference a

Externí odkaz: http://arxiv.org/abs/2409.18417

Zobrazit plný text záznamu

Report

SYNERGAI: Perception Alignment for Human-Robot Collaboration

Autor: Chen, Yixin, Zhang, Guoxi, Zhang, Yaowei, Xu, Hongming, Zhi, Peiyuan, Li, Qing, Huang, Siyuan

Recently, large language models (LLMs) have shown strong potential in facilitating human-robotic interaction and collaboration. However, existing LLM-based systems often overlook the misalignment between human and robot perceptions, which hinders the

Externí odkaz: http://arxiv.org/abs/2409.15684

Zobrazit plný text záznamu

Report

End-to-End Neuro-Symbolic Reinforcement Learning with Textual Explanations

Autor: Luo, Lirui, Zhang, Guoxi, Xu, Hongming, Yang, Yaodong, Fang, Cong, Li, Qing

Neuro-symbolic reinforcement learning (NS-RL) has emerged as a promising paradigm for explainable decision-making, characterized by the interpretability of symbolic policies. NS-RL entails structured state representations for tasks with visual observ

Externí odkaz: http://arxiv.org/abs/2403.12451

Zobrazit plný text záznamu

Report

Online Policy Learning from Offline Preferences

Autor: Zhang, Guoxi, Bao, Han, Kashima, Hisashi

In preference-based reinforcement learning (PbRL), a reward function is learned from a type of human feedback called preference. To expedite preference collection, recent works have leveraged \emph{offline preferences}, which are preferences collecte

Externí odkaz: http://arxiv.org/abs/2403.10160

Zobrazit plný text záznamu

Report

Estimating Treatment Effects Under Heterogeneous Interference

Autor: Lin, Xiaofeng, Zhang, Guoxi, Lu, Xiaotian, Bao, Han, Takeuchi, Koh, Kashima, Hisashi

Publikováno v: September 2023, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases

Treatment effect estimation can assist in effective decision-making in e-commerce, medicine, and education. One popular application of this estimation lies in the prediction of the impact of a treatment (e.g., a promotion) on an outcome (e.g., sales)

Externí odkaz: http://arxiv.org/abs/2309.13884

Zobrazit plný text záznamu

Report

On Modeling Long-Term User Engagement from Stochastic Feedback

Autor: Zhang, Guoxi, Yao, Xing, Xiao, Xuanji

An ultimate goal of recommender systems (RS) is to improve user engagement. Reinforcement learning (RL) is a promising paradigm for this goal, as it directly optimizes overall performance of sequential recommendation. However, many existing RL-based

Externí odkaz: http://arxiv.org/abs/2302.06101

Zobrazit plný text záznamu

Report

Behavior Estimation from Multi-Source Data for Offline Reinforcement Learning

Autor: Zhang, Guoxi, Kashima, Hisashi

Offline reinforcement learning (RL) have received rising interest due to its appealing data efficiency. The present study addresses behavior estimation, a task that lays the foundation of many offline RL algorithms. Behavior estimation aims at estima

Externí odkaz: http://arxiv.org/abs/2211.16078

Zobrazit plný text záznamu

Report

Batch Reinforcement Learning from Crowds

Autor: Zhang, Guoxi, Kashima, Hisashi

A shortcoming of batch reinforcement learning is its requirement for rewards in data, thus not applicable to tasks without reward functions. Existing settings for lack of reward, such as behavioral cloning, rely on optimal demonstrations collected fr

Externí odkaz: http://arxiv.org/abs/2111.04279

Zobrazit plný text záznamu

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Vyhledávací nástroje:

Upřesnit hledání