Zobrazeno 1 - 3
of 3
pro vyhledávání: '"Aqrawi, Alan"'
The Single-Turn Crescendo Attack (STCA), first introduced in Aqrawi and Abbasi [2024], is an innovative method designed to bypass the ethical safeguards of text-to-text AI models, compelling them to generate harmful content. This technique leverages
Externí odkaz:
http://arxiv.org/abs/2411.18699
This study explores the ability of Large Language Model (LLM) agents to detect and correct hallucinations in AI-generated content. A primary agent was tasked with creating a blog about a fictional Danish artist named Flipfloppidy, which was then revi
Externí odkaz:
http://arxiv.org/abs/2410.14262
Autor:
Aqrawi, Alan, Abbasi, Arian
This paper introduces a new method for adversarial attacks on large language models (LLMs) called the Single-Turn Crescendo Attack (STCA). Building on the multi-turn crescendo attack method introduced by Russinovich, Salem, and Eldan (2024), which gr
Externí odkaz:
http://arxiv.org/abs/2409.03131