Výsledky vyhledávání - "A. P. Ettinger"

Report

Autor: Mineault, Patrick, Zanichelli, Niccolò, Peng, Joanne Zichen, Arkhipov, Anton, Bingham, Eli, Jara-Ettinger, Julian, Mackevicius, Emily, Marblestone, Adam, Mattar, Marcelo, Payne, Andrew, Sanborn, Sophia, Schroeder, Karen, Tavares, Zenna, Tolias, Andreas

As AI systems become increasingly powerful, the need for safe AI has become more pressing. Humans are an attractive model for AI safety: as the only known agents capable of general intelligence, they perform robustly even under conditions that deviat

Externí odkaz: http://arxiv.org/abs/2411.18526

Zobrazit plný text záznamu

Report

To Err is AI : A Case Study Informing LLM Flaw Reporting Practices

Autor: McGregor, Sean, Ettinger, Allyson, Judd, Nick, Albee, Paul, Jiang, Liwei, Rao, Kavel, Smith, Will, Longpre, Shayne, Ghosh, Avijit, Fiorelli, Christopher, Hoang, Michelle, Cattell, Sven, Dziri, Nouha

In August of 2024, 495 hackers generated evaluations in an open-ended bug bounty targeting the Open Language Model (OLMo) from The Allen Institute for AI. A vendor panel staffed by representatives of OLMo's safety program adjudicated changes to OLMo'

Externí odkaz: http://arxiv.org/abs/2410.12104

Zobrazit plný text záznamu

Report

AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text

Autor: Lu, Ximing, Sclar, Melanie, Hallinan, Skyler, Mireshghallah, Niloofar, Liu, Jiacheng, Han, Seungju, Ettinger, Allyson, Jiang, Liwei, Chandu, Khyathi, Dziri, Nouha, Choi, Yejin

Creativity has long been considered one of the most difficult aspect of human intelligence for AI to mimic. However, the rise of Large Language Models (LLMs), like ChatGPT, has raised questions about whether AI can match or even surpass human creativ

Externí odkaz: http://arxiv.org/abs/2410.04265

Zobrazit plný text záznamu

Report

WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models

Autor: Jiang, Liwei, Rao, Kavel, Han, Seungju, Ettinger, Allyson, Brahman, Faeze, Kumar, Sachin, Mireshghallah, Niloofar, Lu, Ximing, Sap, Maarten, Choi, Yejin, Dziri, Nouha

We introduce WildTeaming, an automatic LLM safety red-teaming framework that mines in-the-wild user-chatbot interactions to discover 5.7K unique clusters of novel jailbreak tactics, and then composes multiple tactics for systematic exploration of nov

Externí odkaz: http://arxiv.org/abs/2406.18510

Zobrazit plný text záznamu

Report

WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs

Autor: Han, Seungju, Rao, Kavel, Ettinger, Allyson, Jiang, Liwei, Lin, Bill Yuchen, Lambert, Nathan, Choi, Yejin, Dziri, Nouha

We introduce WildGuard -- an open, light-weight moderation tool for LLM safety that achieves three goals: (1) identifying malicious intent in user prompts, (2) detecting safety risks of model responses, and (3) determining model refusal rate. Togethe

Externí odkaz: http://arxiv.org/abs/2406.18495

Zobrazit plný text záznamu

Report

MoST: Multi-modality Scene Tokenization for Motion Prediction

Autor: Mu, Norman, Ji, Jingwei, Yang, Zhenpei, Harada, Nate, Tang, Haotian, Chen, Kan, Qi, Charles R., Ge, Runzhou, Goel, Kratarth, Yang, Zoey, Ettinger, Scott, Al-Rfou, Rami, Anguelov, Dragomir, Zhou, Yin

Many existing motion prediction approaches rely on symbolic perception outputs to generate agent trajectories, such as bounding boxes, road graph information and traffic lights. This symbolic representation is a high-level abstraction of the real wor

Externí odkaz: http://arxiv.org/abs/2404.19531

Zobrazit plný text záznamu

Report

When Hindsight is Not 20/20: Testing Limits on Reflective Thinking in Large Language Models

Autor: Li, Yanhong, Yang, Chenghao, Ettinger, Allyson

Recent studies suggest that self-reflective prompting can significantly enhance the reasoning capabilities of Large Language Models (LLMs). However, the use of external feedback as a stop criterion raises doubts about the true extent of LLMs' ability

Externí odkaz: http://arxiv.org/abs/2404.09129

Zobrazit plný text záznamu

Report

Experimental Contexts Can Facilitate Robust Semantic Property Inference in Language Models, but Inconsistently

Autor: Misra, Kanishka, Ettinger, Allyson, Mahowald, Kyle

Recent zero-shot evaluations have highlighted important limitations in the abilities of language models (LMs) to perform meaning extraction. However, it is now well known that LMs can demonstrate radical improvements in the presence of experimental c

Externí odkaz: http://arxiv.org/abs/2401.06640

Zobrazit plný text záznamu

Report

The Generative AI Paradox: 'What It Can Create, It May Not Understand'

Autor: West, Peter, Lu, Ximing, Dziri, Nouha, Brahman, Faeze, Li, Linjie, Hwang, Jena D., Jiang, Liwei, Fisher, Jillian, Ravichander, Abhilasha, Chandu, Khyathi, Newman, Benjamin, Koh, Pang Wei, Ettinger, Allyson, Choi, Yejin

The recent wave of generative AI has sparked unprecedented global attention, with both excitement and concern over potentially superhuman levels of artificial intelligence: models now take only seconds to produce outputs that would challenge or excee

Externí odkaz: http://arxiv.org/abs/2311.00059

Zobrazit plný text záznamu

Report

Can You Follow Me? Testing Situational Understanding in ChatGPT

Autor: Yang, Chenghao, Ettinger, Allyson

Understanding sentence meanings and updating information states appropriately across time -- what we call "situational understanding" (SU) -- is a critical ability for human-like AI agents. SU is essential in particular for chat models, such as ChatG

Externí odkaz: http://arxiv.org/abs/2310.16135

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání