ADAGENT: Anomaly Detection Agent With Multimodal Large Models in Adverse Environments

Autor:	Miao Zhang, Yiqing Shen, Jun Yin, Shuai Lu, Xueqian Wang
Jazyk:	angličtina
Rok vydání:	2024
Předmět:	Multimodal language model anomaly detection prompt engineering AI agent Electrical engineering. Electronics. Nuclear engineering TK1-9971
Zdroj:	IEEE Access, Vol 12, Pp 172061-172074 (2024)
Druh dokumentu:	article
ISSN:	2169-3536
DOI:	10.1109/ACCESS.2024.3480250
Popis:	Multimodal Language Models (MMLMs), such as LLaVA and GPT-4V, have shown zero-shot generalization capabilities for understanding images and text across various domains. However, their effectiveness in open-world visual tasks, particularly anomaly detection under challenging conditions, such as low light or poor image quality, has yet to be thoroughly investigated. Assessing the robustness and limitations of MMLMs in these scenarios is essential to ensuring their reliability and safety in real-world applications, where the input image quality can vary significantly. To address this gap, we propose a benchmark comprising 460 images captured under challenging conditions, including low light and blurring. This benchmark is specifically designed to evaluate the anomaly detection capabilities of MMLMs. We assess the performance of state-of-the-art MMLMs, such as Qwen-VL-Max-0809, GPT-4V, Gemini-1.5, Claude3-opus, ERNIE-Bot-4, and SparkDesk-v3.5, across six diverse scenes. Our evaluations indicate that these MMLMs struggle with error detection in adverse scenarios, thereby highlighting the need for further investigation into the underlying causes and potential improvement strategies. To tackle these limitations, we introduce a novel anomaly detection agent (ADAGENT) framework, which is an AI agent framework that combines the “Chain of Critical Self-Reflection (CCS)”, specialized toolsets, and “Heuristic Retrieval-Augmented Generation (RAG)” to enhance anomaly detection performance with MMLMs. ADAGENT sequentially evaluates abilities, such as text generation, semantic understanding, contextual comprehension, key information extraction, reasoning, and logical thinking. By implementing this framework, we demonstrated a $15\%\sim {30\%}$ improvement in the top-3 accuracy for anomaly detection tasks under adverse conditions, compared with baseline approaches.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/eb1ab84728ba408eb0a27adedd4ad095 Zobrazit plný text záznamu View record in DOAJ