Zobrazeno 1 - 10
of 3 173
pro vyhledávání: '"A P, Ettinger"'
Autor:
Mineault, Patrick, Zanichelli, Niccolò, Peng, Joanne Zichen, Arkhipov, Anton, Bingham, Eli, Jara-Ettinger, Julian, Mackevicius, Emily, Marblestone, Adam, Mattar, Marcelo, Payne, Andrew, Sanborn, Sophia, Schroeder, Karen, Tavares, Zenna, Tolias, Andreas
As AI systems become increasingly powerful, the need for safe AI has become more pressing. Humans are an attractive model for AI safety: as the only known agents capable of general intelligence, they perform robustly even under conditions that deviat
Externí odkaz:
http://arxiv.org/abs/2411.18526
Autor:
McGregor, Sean, Ettinger, Allyson, Judd, Nick, Albee, Paul, Jiang, Liwei, Rao, Kavel, Smith, Will, Longpre, Shayne, Ghosh, Avijit, Fiorelli, Christopher, Hoang, Michelle, Cattell, Sven, Dziri, Nouha
In August of 2024, 495 hackers generated evaluations in an open-ended bug bounty targeting the Open Language Model (OLMo) from The Allen Institute for AI. A vendor panel staffed by representatives of OLMo's safety program adjudicated changes to OLMo'
Externí odkaz:
http://arxiv.org/abs/2410.12104
Autor:
Lu, Ximing, Sclar, Melanie, Hallinan, Skyler, Mireshghallah, Niloofar, Liu, Jiacheng, Han, Seungju, Ettinger, Allyson, Jiang, Liwei, Chandu, Khyathi, Dziri, Nouha, Choi, Yejin
Creativity has long been considered one of the most difficult aspect of human intelligence for AI to mimic. However, the rise of Large Language Models (LLMs), like ChatGPT, has raised questions about whether AI can match or even surpass human creativ
Externí odkaz:
http://arxiv.org/abs/2410.04265
Autor:
Jiang, Liwei, Rao, Kavel, Han, Seungju, Ettinger, Allyson, Brahman, Faeze, Kumar, Sachin, Mireshghallah, Niloofar, Lu, Ximing, Sap, Maarten, Choi, Yejin, Dziri, Nouha
We introduce WildTeaming, an automatic LLM safety red-teaming framework that mines in-the-wild user-chatbot interactions to discover 5.7K unique clusters of novel jailbreak tactics, and then composes multiple tactics for systematic exploration of nov
Externí odkaz:
http://arxiv.org/abs/2406.18510
Autor:
Han, Seungju, Rao, Kavel, Ettinger, Allyson, Jiang, Liwei, Lin, Bill Yuchen, Lambert, Nathan, Choi, Yejin, Dziri, Nouha
We introduce WildGuard -- an open, light-weight moderation tool for LLM safety that achieves three goals: (1) identifying malicious intent in user prompts, (2) detecting safety risks of model responses, and (3) determining model refusal rate. Togethe
Externí odkaz:
http://arxiv.org/abs/2406.18495
Autor:
Mu, Norman, Ji, Jingwei, Yang, Zhenpei, Harada, Nate, Tang, Haotian, Chen, Kan, Qi, Charles R., Ge, Runzhou, Goel, Kratarth, Yang, Zoey, Ettinger, Scott, Al-Rfou, Rami, Anguelov, Dragomir, Zhou, Yin
Many existing motion prediction approaches rely on symbolic perception outputs to generate agent trajectories, such as bounding boxes, road graph information and traffic lights. This symbolic representation is a high-level abstraction of the real wor
Externí odkaz:
http://arxiv.org/abs/2404.19531
Recent studies suggest that self-reflective prompting can significantly enhance the reasoning capabilities of Large Language Models (LLMs). However, the use of external feedback as a stop criterion raises doubts about the true extent of LLMs' ability
Externí odkaz:
http://arxiv.org/abs/2404.09129
Recent zero-shot evaluations have highlighted important limitations in the abilities of language models (LMs) to perform meaning extraction. However, it is now well known that LMs can demonstrate radical improvements in the presence of experimental c
Externí odkaz:
http://arxiv.org/abs/2401.06640
Autor:
West, Peter, Lu, Ximing, Dziri, Nouha, Brahman, Faeze, Li, Linjie, Hwang, Jena D., Jiang, Liwei, Fisher, Jillian, Ravichander, Abhilasha, Chandu, Khyathi, Newman, Benjamin, Koh, Pang Wei, Ettinger, Allyson, Choi, Yejin
The recent wave of generative AI has sparked unprecedented global attention, with both excitement and concern over potentially superhuman levels of artificial intelligence: models now take only seconds to produce outputs that would challenge or excee
Externí odkaz:
http://arxiv.org/abs/2311.00059
Autor:
Yang, Chenghao, Ettinger, Allyson
Understanding sentence meanings and updating information states appropriately across time -- what we call "situational understanding" (SU) -- is a critical ability for human-like AI agents. SU is essential in particular for chat models, such as ChatG
Externí odkaz:
http://arxiv.org/abs/2310.16135