Výsledky vyhledávání

Report

Hide Your Malicious Goal Into Benign Narratives: Jailbreak Large Language Models through Neural Carrier Articles

Autor: Wang, Zhilong, Wang, Haizhou, Luo, Nanqing, Zhang, Lan, Sun, Xiaoyan, Cao, Yebo, Liu, Peng

Jailbreak attacks on Language Model Models (LLMs) entail crafting prompts aimed at exploiting the models to generate malicious content. This paper proposes a new type of jailbreak attacks which shift the attention of the LLM by inserting a prohibited

Externí odkaz: http://arxiv.org/abs/2408.11182

Zobrazit plný text záznamu

Report

Hidden You Malicious Goal Into Benign Narratives: Jailbreak Large Language Models through Logic Chain Injection

Autor: Wang, Zhilong, Cao, Yebo, Liu, Peng

Jailbreak attacks on Language Model Models (LLMs) entail crafting prompts aimed at exploiting the models to generate malicious content. Existing jailbreak attacks can successfully deceive the LLMs, however they cannot deceive the human. This paper pr

Externí odkaz: http://arxiv.org/abs/2404.04849

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání