Výsledky vyhledávání - "Gumaste, Rohan"

Report

IterGen: Iterative Structured LLM Generation

Autor: Ugare, Shubham, Gumaste, Rohan, Suresh, Tarun, Singh, Gagandeep, Misailovic, Sasa

Large Language Models (LLMs) are widely used for tasks such as natural language and code generation. Still, their outputs often suffer from issues like privacy violations, and semantically inaccurate code generation. Current libraries for LLM generat

Externí odkaz: http://arxiv.org/abs/2410.07295

Zobrazit plný text záznamu

Report

Binary Reward Labeling: Bridging Offline Preference and Reward-Based Reinforcement Learning

Autor: Xu, Yinglun, Zhu, David, Gumaste, Rohan, Singh, Gagandeep

Offline reinforcement learning has become one of the most practical RL settings. However, most existing works on offline RL focus on the standard setting with scalar reward feedback. It remains unknown how to universally transfer the existing rich un

Externí odkaz: http://arxiv.org/abs/2406.10445

Zobrazit plný text záznamu

Report

Universal Black-Box Reward Poisoning Attack against Offline Reinforcement Learning

Autor: Xu, Yinglun, Gumaste, Rohan, Singh, Gagandeep

We study the problem of universal black-boxed reward poisoning attacks against general offline reinforcement learning with deep neural networks. We consider a black-box threat model where the attacker is entirely oblivious to the learning algorithm,

Externí odkaz: http://arxiv.org/abs/2402.09695

Zobrazit plný text záznamu

Report

Two-Step Offline Preference-Based Reinforcement Learning with Constrained Actions

Autor: Xu, Yinglun, Suresh, Tarun, Gumaste, Rohan, Zhu, David, Li, Ruirui, Wang, Zhengyang, Jiang, Haoming, Tang, Xianfeng, Yin, Qingyu, Cheng, Monica Xiao, Zeng, Qi, Zhang, Chao, Singh, Gagandeep

Preference-based reinforcement learning (PBRL) in the offline setting has succeeded greatly in industrial applications such as chatbots. A two-step learning framework where one applies a reinforcement learning step after a reward modeling step has be

Externí odkaz: http://arxiv.org/abs/2401.00330

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání