Výsledky vyhledávání - "Zhang, Shenao"

Report

Self-Exploring Language Models: Active Preference Elicitation for Online Alignment

Autor: Zhang, Shenao, Yu, Donghan, Sharma, Hiteshi, Yang, Ziyi, Wang, Shuohang, Hassan, Hany, Wang, Zhaoran

Preference optimization, particularly through Reinforcement Learning from Human Feedback (RLHF), has achieved significant success in aligning Large Language Models (LLMs) to adhere to human intentions. Unlike offline alignment with a fixed dataset, o

Externí odkaz: http://arxiv.org/abs/2405.19332

Zobrazit plný text záznamu

Report

Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer

Autor: Liu, Zhihan, Lu, Miao, Zhang, Shenao, Liu, Boyi, Guo, Hongyi, Yang, Yingxiang, Blanchet, Jose, Wang, Zhaoran

Aligning generative models with human preference via RLHF typically suffers from overoptimization, where an imperfectly learned reward model can misguide the generative model to output undesired responses. We investigate this problem in a principled

Externí odkaz: http://arxiv.org/abs/2405.16436

Zobrazit plný text záznamu

Report

How Can LLM Guide RL? A Value-Based Approach

Autor: Zhang, Shenao, Zheng, Sirui, Ke, Shuqi, Liu, Zhihan, Jin, Wanxin, Yuan, Jianbo, Yang, Yingxiang, Yang, Hongxia, Wang, Zhaoran

Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback. However, RL algorithms may require extensive trial-and-error interactions to collect usef

Externí odkaz: http://arxiv.org/abs/2402.16181

Zobrazit plný text záznamu

Report

Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms

Autor: Zhang, Shenao, Liu, Boyi, Wang, Zhaoran, Zhao, Tuo

ReParameterization (RP) Policy Gradient Methods (PGMs) have been widely adopted for continuous control tasks in robotics and computer graphics. However, recent studies have revealed that, when applied to long-term reinforcement learning problems, mod

Externí odkaz: http://arxiv.org/abs/2310.19927

Zobrazit plný text záznamu

Report

Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency

Autor: Liu, Zhihan, Hu, Hao, Zhang, Shenao, Guo, Hongyi, Ke, Shuqi, Liu, Boyi, Wang, Zhaoran

Large language models (LLMs) demonstrate impressive reasoning abilities, but translating reasoning into actions in the real world remains challenging. In particular, it remains unclear how to complete a given task provably within a minimum number of

Externí odkaz: http://arxiv.org/abs/2309.17382

Zobrazit plný text záznamu

Report

Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration

Autor: Liu, Zhihan, Lu, Miao, Xiong, Wei, Zhong, Han, Hu, Hao, Zhang, Shenao, Zheng, Sirui, Yang, Zhuoran, Wang, Zhaoran

In online reinforcement learning (online RL), balancing exploration and exploitation is crucial for finding an optimal policy in a sample-efficient way. To achieve this, existing sample-efficient online RL algorithms typically consist of three compon

Externí odkaz: http://arxiv.org/abs/2305.18258

Zobrazit plný text záznamu

Report

Asking Before Acting: Gather Information in Embodied Decision Making with Language Models

Autor: Chen, Xiaoyu, Zhang, Shenao, Zhang, Pushi, Zhao, Li, Chen, Jianyu

With strong capabilities of reasoning and a broad understanding of the world, Large Language Models (LLMs) have demonstrated immense potential in building versatile embodied decision-making agents capable of executing a wide array of tasks. Neverthel

Externí odkaz: http://arxiv.org/abs/2305.15695

Zobrazit plný text záznamu

Report

Conservative Dual Policy Optimization for Efficient Model-Based Reinforcement Learning

Autor: Zhang, Shenao

Provably efficient Model-Based Reinforcement Learning (MBRL) based on optimism or posterior sampling (PSRL) is ensured to attain the global optimality asymptotically by introducing the complexity measure of the model. However, the complexity might gr

Externí odkaz: http://arxiv.org/abs/2209.07676

Zobrazit plný text záznamu

Report

Learning Meta Representations for Agents in Multi-Agent Reinforcement Learning

Autor: Zhang, Shenao, Shen, Li, Han, Lei

In multi-agent reinforcement learning, the behaviors that agents learn in a single Markov Game (MG) are typically confined to the given agent number. Every single MG induced by varying the population may possess distinct optimal joint strategies and

Externí odkaz: http://arxiv.org/abs/2108.12988

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání