Zobrazeno 1 - 10
of 585
pro vyhledávání: '"Jiang, Liwei"'
Autor:
Zhou, Xuhui, Kim, Hyunwoo, Brahman, Faeze, Jiang, Liwei, Zhu, Hao, Lu, Ximing, Xu, Frank, Lin, Bill Yuchen, Choi, Yejin, Mireshghallah, Niloofar, Bras, Ronan Le, Sap, Maarten
AI agents are increasingly autonomous in their interactions with human users and tools, leading to increased interactional safety risks. We present HAICOSYSTEM, a framework examining AI agent safety within diverse and complex social interactions. HAI
Externí odkaz:
http://arxiv.org/abs/2409.16427
Autor:
Zhao, Wenting, Goyal, Tanya, Chiu, Yu Ying, Jiang, Liwei, Newman, Benjamin, Ravichander, Abhilasha, Chandu, Khyathi, Bras, Ronan Le, Cardie, Claire, Deng, Yuntian, Choi, Yejin
While hallucinations of large language models (LLMs) prevail as a major challenge, existing evaluation benchmarks on factuality do not cover the diverse domains of knowledge that the real-world users of LLMs seek information about. To bridge this gap
Externí odkaz:
http://arxiv.org/abs/2407.17468
Autor:
Jiang, Liwei, Rao, Kavel, Han, Seungju, Ettinger, Allyson, Brahman, Faeze, Kumar, Sachin, Mireshghallah, Niloofar, Lu, Ximing, Sap, Maarten, Choi, Yejin, Dziri, Nouha
We introduce WildTeaming, an automatic LLM safety red-teaming framework that mines in-the-wild user-chatbot interactions to discover 5.7K unique clusters of novel jailbreak tactics, and then composes multiple tactics for systematic exploration of nov
Externí odkaz:
http://arxiv.org/abs/2406.18510
Autor:
Han, Seungju, Rao, Kavel, Ettinger, Allyson, Jiang, Liwei, Lin, Bill Yuchen, Lambert, Nathan, Choi, Yejin, Dziri, Nouha
We introduce WildGuard -- an open, light-weight moderation tool for LLM safety that achieves three goals: (1) identifying malicious intent in user prompts, (2) detecting safety risks of model responses, and (3) determining model refusal rate. Togethe
Externí odkaz:
http://arxiv.org/abs/2406.18495
Autor:
Li, Huihan, Jiang, Liwei, Hwang, Jena D., Kim, Hyunwoo, Santy, Sebastin, Sorensen, Taylor, Lin, Bill Yuchen, Dziri, Nouha, Ren, Xiang, Choi, Yejin
As the utilization of large language models (LLMs) has proliferated world-wide, it is crucial for them to have adequate knowledge and fair representation for diverse global cultures. In this work, we uncover culture perceptions of three SOTA models o
Externí odkaz:
http://arxiv.org/abs/2404.10199
Autor:
Chiu, Yu Ying, Jiang, Liwei, Antoniak, Maria, Park, Chan Young, Li, Shuyue Stella, Bhatia, Mehar, Ravi, Sahithya, Tsvetkov, Yulia, Shwartz, Vered, Choi, Yejin
Frontier large language models (LLMs) are developed by researchers and practitioners with skewed cultural backgrounds and on datasets with skewed sources. However, LLMs' (lack of) multicultural knowledge cannot be effectively assessed with current me
Externí odkaz:
http://arxiv.org/abs/2404.06664
Autor:
Mun, Jimin, Jiang, Liwei, Liang, Jenny, Cheong, Inyoung, DeCario, Nicole, Choi, Yejin, Kohno, Tadayoshi, Sap, Maarten
General purpose AI, such as ChatGPT, seems to have lowered the barriers for the public to use AI and harness its power. However, the governance and development of AI still remain in the hands of a few, and the pace of development is accelerating with
Externí odkaz:
http://arxiv.org/abs/2403.14791
Autor:
Jung, Jaehun, Lu, Ximing, Jiang, Liwei, Brahman, Faeze, West, Peter, Koh, Pang Wei, Choi, Yejin
The current winning recipe for automatic summarization is using proprietary large-scale language models (LLMs) such as ChatGPT as is, or imitation learning from them as teacher models. While increasingly ubiquitous dependence on such large-scale lang
Externí odkaz:
http://arxiv.org/abs/2403.13780
The permanence of online content combined with the enhanced authorship identification techniques calls for stronger computational methods to protect the identity and privacy of online authorship when needed, e.g., blind reviews for scientific papers,
Externí odkaz:
http://arxiv.org/abs/2402.08761
Autor:
Sorensen, Taylor, Moore, Jared, Fisher, Jillian, Gordon, Mitchell, Mireshghallah, Niloofar, Rytting, Christopher Michael, Ye, Andre, Jiang, Liwei, Lu, Ximing, Dziri, Nouha, Althoff, Tim, Choi, Yejin
With increased power and prevalence of AI systems, it is ever more critical that AI systems are designed to serve all, i.e., people with diverse values and perspectives. However, aligning models to serve pluralistic human values remains an open resea
Externí odkaz:
http://arxiv.org/abs/2402.05070