Výsledky vyhledávání - "Aqrawi, Alan"

Report

An indicator for effectiveness of text-to-image guardrails utilizing the Single-Turn Crescendo Attack (STCA)

Autor: Kwartler, Ted, Bagan, Nataliia, Banny, Ivan, Aqrawi, Alan, Abbasi, Arian

The Single-Turn Crescendo Attack (STCA), first introduced in Aqrawi and Abbasi [2024], is an innovative method designed to bypass the ethical safeguards of text-to-text AI models, compelling them to generate harmful content. This technique leverages

Externí odkaz: http://arxiv.org/abs/2411.18699

Zobrazit plný text záznamu

Report

Good Parenting is all you need -- Multi-agentic LLM Hallucination Mitigation

Autor: Kwartler, Ted, Berman, Matthew, Aqrawi, Alan

This study explores the ability of Large Language Model (LLM) agents to detect and correct hallucinations in AI-generated content. A primary agent was tasked with creating a blog about a fictional Danish artist named Flipfloppidy, which was then revi

Externí odkaz: http://arxiv.org/abs/2410.14262

Zobrazit plný text záznamu

Report

Well, that escalated quickly: The Single-Turn Crescendo Attack (STCA)

Autor: Aqrawi, Alan, Abbasi, Arian

This paper introduces a new method for adversarial attacks on large language models (LLMs) called the Single-Turn Crescendo Attack (STCA). Building on the multi-turn crescendo attack method introduced by Russinovich, Salem, and Eldan (2024), which gr

Externí odkaz: http://arxiv.org/abs/2409.03131

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání