Zobrazeno 1 - 10
of 4 635
pro vyhledávání: '"red teaming"'
As generative AI technologies find more and more real-world applications, the importance of testing their performance and safety seems paramount. ``Red-teaming'' has quickly become the primary approach to test AI models--prioritized by AI companies,
Externí odkaz:
http://arxiv.org/abs/2412.09751
Recent studies have discovered that LLMs have serious privacy leakage concerns, where an LLM may be fooled into outputting private information under carefully crafted adversarial prompts. These risks include leaking system prompts, personally identif
Externí odkaz:
http://arxiv.org/abs/2412.05734
Autor:
Karnik, Sathwik, Hong, Zhang-Wei, Abhangi, Nishant, Lin, Yen-Chen, Wang, Tsun-Hsuan, Agrawal, Pulkit
Language-conditioned robot models (i.e., robotic foundation models) enable robots to perform a wide range of tasks based on natural language instructions. Despite strong performance on existing benchmarks, evaluating the safety and effectiveness of t
Externí odkaz:
http://arxiv.org/abs/2411.18676
Text-to-image (T2I) models have shown remarkable progress, but their potential to generate harmful content remains a critical concern in the ML community. While various safety mechanisms have been developed, the field lacks systematic tools for evalu
Externí odkaz:
http://arxiv.org/abs/2411.16769
Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language processing tasks, but their vulnerability to jailbreak attacks poses significant security risks. This survey paper presents a comprehensive analysis of recent
Externí odkaz:
http://arxiv.org/abs/2410.09097
Autor:
Pavlova, Maya, Brinkman, Erik, Iyer, Krithika, Albiero, Vitor, Bitton, Joanna, Nguyen, Hailey, Li, Joe, Ferrer, Cristian Canton, Evtimov, Ivan, Grattafiori, Aaron
Red teaming assesses how large language models (LLMs) can produce content that violates norms, policies, and rules set during their safety training. However, most existing automated methods in the literature are not representative of the way humans t
Externí odkaz:
http://arxiv.org/abs/2410.01606
Autor:
Munoz, Gary D. Lopez, Minnich, Amanda J., Lutz, Roman, Lundeen, Richard, Dheekonda, Raja Sekhar Rao, Chikanov, Nina, Jagdagdorj, Bolor-Erdene, Pouliot, Martin, Chawla, Shiven, Maxwell, Whitney, Bullwinkel, Blake, Pratt, Katherine, de Gruyter, Joris, Siska, Charlotte, Bryan, Pete, Westerhoff, Tori, Kawaguchi, Chang, Seifert, Christian, Kumar, Ram Shankar Siva, Zunger, Yonatan
Generative Artificial Intelligence (GenAI) is becoming ubiquitous in our daily lives. The increase in computational power and data availability has led to a proliferation of both single- and multi-modal models. As the GenAI ecosystem matures, the nee
Externí odkaz:
http://arxiv.org/abs/2410.02828