Zobrazeno 1 - 6
of 6
pro vyhledávání: '"Bai, Fengshuo"'
Autor:
Bai, Fengshuo, Wang, Mingzhi, Zhang, Zhaowei, Chen, Boyuan, Xu, Yinda, Wen, Ying, Yang, Yaodong
With recent advancements in large language models (LLMs), alignment has emerged as an effective technique for keeping LLMs consensus with human intent. Current methods primarily involve direct training through Supervised Fine-tuning (SFT) or Reinforc
Externí odkaz:
http://arxiv.org/abs/2405.18718
Autor:
Bai, Fengshuo, Zhao, Rui, Zhang, Hongming, Cui, Sijia, Wen, Ying, Yang, Yaodong, Xu, Bo, Han, Lei
Preference-based reinforcement learning (PbRL) has shown impressive capabilities in training agents without reward engineering. However, a notable limitation of PbRL is its dependency on substantial human feedback. This dependency stems from the lear
Externí odkaz:
http://arxiv.org/abs/2405.18688
The burgeoning integration of artificial intelligence (AI) into human society brings forth significant implications for societal governance and safety. While considerable strides have been made in addressing AI alignment challenges, existing methodol
Externí odkaz:
http://arxiv.org/abs/2402.12907
Personal values are a crucial factor behind human decision-making. Considering that Large Language Models (LLMs) have been shown to impact human decisions significantly, it is essential to make sure they accurately understand human values to ensure t
Externí odkaz:
http://arxiv.org/abs/2310.00378
PEARL: Zero-shot Cross-task Preference Alignment and Robust Reward Learning for Robotic Manipulation
In preference-based Reinforcement Learning (RL), obtaining a large number of preference labels are both time-consuming and costly. Furthermore, the queried human preferences cannot be utilized for the new tasks. In this paper, we propose Zero-shot Cr
Externí odkaz:
http://arxiv.org/abs/2306.03615
Preference-based Reinforcement Learning (PbRL) has demonstrated remarkable efficacy in aligning rewards with human intentions. However, a significant challenge lies in the need of substantial human labels, which is costly and time-consuming. Addition
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::cb9185284bc6bf160d02df22362b13ec
http://arxiv.org/abs/2306.03615
http://arxiv.org/abs/2306.03615