Zobrazeno 1 - 10
of 163
pro vyhledávání: '"Watkins, Olivia"'
Autor:
Souly, Alexandra, Lu, Qingyuan, Bowen, Dillon, Trinh, Tu, Hsieh, Elvis, Pandey, Sana, Abbeel, Pieter, Svegliato, Justin, Emmons, Scott, Watkins, Olivia, Toyer, Sam
Most jailbreak papers claim the jailbreaks they propose are highly effective, often boasting near-100% attack success rates. However, it is perhaps more common than not for jailbreak developers to substantially exaggerate the effectiveness of their j
Externí odkaz:
http://arxiv.org/abs/2402.10260
Autor:
Toyer, Sam, Watkins, Olivia, Mendes, Ethan Adrian, Svegliato, Justin, Bailey, Luke, Wang, Tiffany, Ong, Isaac, Elmaaroufi, Karim, Abbeel, Pieter, Darrell, Trevor, Ritter, Alan, Russell, Stuart
While Large Language Models (LLMs) are increasingly being used in real-world applications, they remain vulnerable to prompt injection attacks: malicious third party prompts that subvert the intent of the system designer. To help researchers study thi
Externí odkaz:
http://arxiv.org/abs/2311.01011
Autor:
Lin, Jessy, Du, Yuqing, Watkins, Olivia, Hafner, Danijar, Abbeel, Pieter, Klein, Dan, Dragan, Anca
To interact with humans and act in the world, agents need to understand the range of language that people use and relate it to the visual world. While current agents can learn to execute simple language instructions, we aim to build agents that lever
Externí odkaz:
http://arxiv.org/abs/2308.01399
Autor:
Fan, Ying, Watkins, Olivia, Du, Yuqing, Liu, Hao, Ryu, Moonkyung, Boutilier, Craig, Abbeel, Pieter, Ghavamzadeh, Mohammad, Lee, Kangwook, Lee, Kimin
Learning from human feedback has been shown to improve text-to-image models. These techniques first learn a reward function that captures what humans care about in the task and then improve the models based on the learned reward function. Even though
Externí odkaz:
http://arxiv.org/abs/2305.16381
Autor:
Lee, Kimin, Liu, Hao, Ryu, Moonkyung, Watkins, Olivia, Du, Yuqing, Boutilier, Craig, Abbeel, Pieter, Ghavamzadeh, Mohammad, Gu, Shixiang Shane
Deep generative models have shown impressive results in text-to-image synthesis. However, current text-to-image models often generate images that are inadequately aligned with text prompts. We propose a fine-tuning method for aligning such models usi
Externí odkaz:
http://arxiv.org/abs/2302.12192
Autor:
Du, Yuqing, Watkins, Olivia, Wang, Zihan, Colas, Cédric, Darrell, Trevor, Abbeel, Pieter, Gupta, Abhishek, Andreas, Jacob
Reinforcement learning algorithms typically struggle in the absence of a dense, well-shaped reward function. Intrinsically motivated exploration methods address this limitation by rewarding agents for visiting novel states or transitions, but these m
Externí odkaz:
http://arxiv.org/abs/2302.06692
Training automated agents to complete complex tasks in interactive environments is challenging: reinforcement learning requires careful hand-engineering of reward functions, imitation learning requires specialized infrastructure and access to a human
Externí odkaz:
http://arxiv.org/abs/2203.11197
Autor:
Frost, Julius, Watkins, Olivia, Weiner, Eric, Abbeel, Pieter, Darrell, Trevor, Plummer, Bryan, Saenko, Kate
In order for humans to confidently decide where to employ RL agents for real-world tasks, a human developer must validate that the agent will perform well at test-time. Some policy interpretability methods facilitate this by capturing the policy's de
Externí odkaz:
http://arxiv.org/abs/2201.12462
Policies trained in simulation often fail when transferred to the real world due to the `reality gap' where the simulator is unable to accurately capture the dynamics and visual properties of the real world. Current approaches to tackle this problem,
Externí odkaz:
http://arxiv.org/abs/2104.07662
Many challenges in natural language processing require generating text, including language translation, dialogue generation, and speech recognition. For all of these problems, text generation becomes more difficult as the text becomes longer. Current
Externí odkaz:
http://arxiv.org/abs/1810.08802