Zobrazeno 1 - 9
of 9
pro vyhledávání: '"Lou, Xingzhou"'
Reward models (RM) play a critical role in aligning generations of large language models (LLM) to human expectations. However, prevailing RMs fail to capture the stochasticity within human preferences and cannot effectively evaluate the reliability o
Externí odkaz:
http://arxiv.org/abs/2410.00847
Autor:
Yan, Yuzi, Lou, Xingzhou, Li, Jialian, Zhang, Yiping, Xie, Jian, Yu, Chao, Wang, Yu, Yan, Dong, Shen, Yuan
As Large Language Models (LLMs) continue to progress toward more advanced forms of intelligence, Reinforcement Learning from Human Feedback (RLHF) is increasingly seen as a key pathway toward achieving Artificial General Intelligence (AGI). However,
Externí odkaz:
http://arxiv.org/abs/2409.15360
Decision making demands intricate interplay between perception, memory, and reasoning to discern optimal policies. Conventional approaches to decision making face challenges related to low sample efficiency and poor generalization. In contrast, found
Externí odkaz:
http://arxiv.org/abs/2405.17009
Human preference alignment is critical in building powerful and reliable large language models (LLMs). However, current methods either ignore the multi-dimensionality of human preferences (e.g. helpfulness and harmlessness) or struggle with the compl
Externí odkaz:
http://arxiv.org/abs/2405.12739
Safe reinforcement learning (RL) agents accomplish given tasks while adhering to specific constraints. Employing constraints expressed via easily-understandable human language offers considerable potential for real-world applications due to its acces
Externí odkaz:
http://arxiv.org/abs/2401.07553
Multi-Agent Policy Gradient (MAPG) has made significant progress in recent years. However, centralized critics in state-of-the-art MAPG methods still face the centralized-decentralized mismatch (CDM) issue, which means sub-optimal actions by some age
Externí odkaz:
http://arxiv.org/abs/2312.15667
Zero-shot human-AI coordination holds the promise of collaborating with humans without human data. Prevailing methods try to train the ego agent with a population of partners via self-play. However, these methods suffer from two problems: 1) The dive
Externí odkaz:
http://arxiv.org/abs/2301.06387
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.
Autor:
Lou, Xingzhou, Yin, Qiyue, Zhang, Junge, Yu, Chao, He, Zhaofeng, Cheng, Nengjie, Huang, Kaiqi
Publikováno v:
In Information Sciences September 2022 610:746-758