Zobrazeno 1 - 1
of 1
pro vyhledávání: '"Lee, Bingze"'
Aligned Large Language Models (LLMs) have demonstrated remarkable performance across various tasks. However, LLMs remain susceptible to jailbreak adversarial attacks, where adversaries manipulate prompts to elicit malicious responses that aligned LLM
Externí odkaz:
http://arxiv.org/abs/2410.15362