Výsledky vyhledávání - "Jiang, Liwei"

Report

HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions

Autor: Zhou, Xuhui, Kim, Hyunwoo, Brahman, Faeze, Jiang, Liwei, Zhu, Hao, Lu, Ximing, Xu, Frank, Lin, Bill Yuchen, Choi, Yejin, Mireshghallah, Niloofar, Bras, Ronan Le, Sap, Maarten

AI agents are increasingly autonomous in their interactions with human users and tools, leading to increased interactional safety risks. We present HAICOSYSTEM, a framework examining AI agent safety within diverse and complex social interactions. HAI

Externí odkaz: http://arxiv.org/abs/2409.16427

Zobrazit plný text záznamu

Report

WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries

Autor: Zhao, Wenting, Goyal, Tanya, Chiu, Yu Ying, Jiang, Liwei, Newman, Benjamin, Ravichander, Abhilasha, Chandu, Khyathi, Bras, Ronan Le, Cardie, Claire, Deng, Yuntian, Choi, Yejin

While hallucinations of large language models (LLMs) prevail as a major challenge, existing evaluation benchmarks on factuality do not cover the diverse domains of knowledge that the real-world users of LLMs seek information about. To bridge this gap

Externí odkaz: http://arxiv.org/abs/2407.17468

Zobrazit plný text záznamu

Report

WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models

Autor: Jiang, Liwei, Rao, Kavel, Han, Seungju, Ettinger, Allyson, Brahman, Faeze, Kumar, Sachin, Mireshghallah, Niloofar, Lu, Ximing, Sap, Maarten, Choi, Yejin, Dziri, Nouha

We introduce WildTeaming, an automatic LLM safety red-teaming framework that mines in-the-wild user-chatbot interactions to discover 5.7K unique clusters of novel jailbreak tactics, and then composes multiple tactics for systematic exploration of nov

Externí odkaz: http://arxiv.org/abs/2406.18510

Zobrazit plný text záznamu

Report

WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs

Autor: Han, Seungju, Rao, Kavel, Ettinger, Allyson, Jiang, Liwei, Lin, Bill Yuchen, Lambert, Nathan, Choi, Yejin, Dziri, Nouha

We introduce WildGuard -- an open, light-weight moderation tool for LLM safety that achieves three goals: (1) identifying malicious intent in user prompts, (2) detecting safety risks of model responses, and (3) determining model refusal rate. Togethe

Externí odkaz: http://arxiv.org/abs/2406.18495

Zobrazit plný text záznamu

Report

CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting

Autor: Li, Huihan, Jiang, Liwei, Hwang, Jena D., Kim, Hyunwoo, Santy, Sebastin, Sorensen, Taylor, Lin, Bill Yuchen, Dziri, Nouha, Ren, Xiang, Choi, Yejin

As the utilization of large language models (LLMs) has proliferated world-wide, it is crucial for them to have adequate knowledge and fair representation for diverse global cultures. In this work, we uncover culture perceptions of three SOTA models o

Externí odkaz: http://arxiv.org/abs/2404.10199

Zobrazit plný text záznamu

Report

CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge

Autor: Chiu, Yu Ying, Jiang, Liwei, Antoniak, Maria, Park, Chan Young, Li, Shuyue Stella, Bhatia, Mehar, Ravi, Sahithya, Tsvetkov, Yulia, Shwartz, Vered, Choi, Yejin

Frontier large language models (LLMs) are developed by researchers and practitioners with skewed cultural backgrounds and on datasets with skewed sources. However, LLMs' (lack of) multicultural knowledge cannot be effectively assessed with current me

Externí odkaz: http://arxiv.org/abs/2404.06664

Zobrazit plný text záznamu

Report

Particip-AI: A Democratic Surveying Framework for Anticipating Future AI Use Cases, Harms and Benefits

Autor: Mun, Jimin, Jiang, Liwei, Liang, Jenny, Cheong, Inyoung, DeCario, Nicole, Choi, Yejin, Kohno, Tadayoshi, Sap, Maarten

General purpose AI, such as ChatGPT, seems to have lowered the barriers for the public to use AI and harness its power. However, the governance and development of AI still remain in the hands of a few, and the pace of development is accelerating with

Externí odkaz: http://arxiv.org/abs/2403.14791

Zobrazit plný text záznamu

Report

Information-Theoretic Distillation for Reference-less Summarization

Autor: Jung, Jaehun, Lu, Ximing, Jiang, Liwei, Brahman, Faeze, West, Peter, Koh, Pang Wei, Choi, Yejin

The current winning recipe for automatic summarization is using proprietary large-scale language models (LLMs) such as ChatGPT as is, or imitation learning from them as teacher models. While increasingly ubiquitous dependence on such large-scale lang

Externí odkaz: http://arxiv.org/abs/2403.13780

Zobrazit plný text záznamu

Report

JAMDEC: Unsupervised Authorship Obfuscation using Constrained Decoding over Small Language Models

Autor: Fisher, Jillian, Lu, Ximing, Jung, Jaehun, Jiang, Liwei, Harchaoui, Zaid, Choi, Yejin

The permanence of online content combined with the enhanced authorship identification techniques calls for stronger computational methods to protect the identity and privacy of online authorship when needed, e.g., blind reviews for scientific papers,

Externí odkaz: http://arxiv.org/abs/2402.08761

Zobrazit plný text záznamu

Report

A Roadmap to Pluralistic Alignment

Autor: Sorensen, Taylor, Moore, Jared, Fisher, Jillian, Gordon, Mitchell, Mireshghallah, Niloofar, Rytting, Christopher Michael, Ye, Andre, Jiang, Liwei, Lu, Ximing, Dziri, Nouha, Althoff, Tim, Choi, Yejin

With increased power and prevalence of AI systems, it is ever more critical that AI systems are designed to serve all, i.e., people with diverse values and perspectives. However, aligning models to serve pluralistic human values remains an open resea

Externí odkaz: http://arxiv.org/abs/2402.05070

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání