Zobrazeno 1 - 2
of 2
pro vyhledávání: '"Cao, Yebo"'
Jailbreak attacks on Language Model Models (LLMs) entail crafting prompts aimed at exploiting the models to generate malicious content. This paper proposes a new type of jailbreak attacks which shift the attention of the LLM by inserting a prohibited
Externí odkaz:
http://arxiv.org/abs/2408.11182
Jailbreak attacks on Language Model Models (LLMs) entail crafting prompts aimed at exploiting the models to generate malicious content. Existing jailbreak attacks can successfully deceive the LLMs, however they cannot deceive the human. This paper pr
Externí odkaz:
http://arxiv.org/abs/2404.04849