LLavaGuard: VLM-based Safeguards for Vision Dataset Curation and Safety Assessment

Autor:	Helff, Lukas, Friedrich, Felix, Brack, Manuel, Kersting, Kristian, Schramowski, Patrick
Rok vydání:	2024
Předmět:	Computer Science - Computer Vision and Pattern Recognition Computer Science - Artificial Intelligence Computer Science - Machine Learning
Druh dokumentu:	Working Paper
Popis:	We introduce LlavaGuard, a family of VLM-based safeguard models, offering a versatile framework for evaluating the safety compliance of visual content. Specifically, we designed LlavaGuard for dataset annotation and generative model safeguarding. To this end, we collected and annotated a high-quality visual dataset incorporating a broad safety taxonomy, which we use to tune VLMs on context-aware safety risks. As a key innovation, LlavaGuard's new responses contain comprehensive information, including a safety rating, the violated safety categories, and an in-depth rationale. Further, our introduced customizable taxonomy categories enable the context-specific alignment of LlavaGuard to various scenarios. Our experiments highlight the capabilities of LlavaGuard in complex and real-world applications. We provide checkpoints ranging from 7B to 34B parameters demonstrating state-of-the-art performance, with even the smallest models outperforming baselines like GPT-4. We make our dataset and model weights publicly available and invite further research to address the diverse needs of communities and contexts. Comment: Project page at https://ml-research.github.io/human-centered-genai/projects/llavaguard/index.html
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2406.05113 Zobrazit plný text záznamu View this record from Arxiv