Using Multimodal Large Language Models (MLLMs) for Automated Detection of Traffic Safety-Critical Events

Autor:	Mohammad Abu Tami, Huthaifa I. Ashqar, Mohammed Elhenawy, Sebastien Glaser, Andry Rakotonirainy
Jazyk:	angličtina
Rok vydání:	2024
Předmět:	multimodal large language models (MLLMs) safety-critical events in-context learning (ICL) self-ensemble learning object-level question–answers (QAs) Mechanical engineering and machinery TJ1-1570 Machine design and drawing TJ227-240 Motor vehicles. Aeronautics. Astronautics TL1-4050
Zdroj:	Vehicles, Vol 6, Iss 3, Pp 1571-1590 (2024)
Druh dokumentu:	article
ISSN:	2624-8921
DOI:	10.3390/vehicles6030074
Popis:	Traditional approaches to safety event analysis in autonomous systems have relied on complex machine and deep learning models and extensive datasets for high accuracy and reliability. However, the emerge of multimodal large language models (MLLMs) offers a novel approach by integrating textual, visual, and audio modalities. Our framework leverages the logical and visual reasoning power of MLLMs, directing their output through object-level question–answer (QA) prompts to ensure accurate, reliable, and actionable insights for investigating safety-critical event detection and analysis. By incorporating models like Gemini-Pro-Vision 1.5, we aim to automate safety-critical event detection and analysis along with mitigating common issues such as hallucinations in MLLM outputs. The results demonstrate the framework’s potential in different in-context learning (ICT) settings such as zero-shot and few-shot learning methods. Furthermore, we investigate other settings such as self-ensemble learning and a varying number of frames. The results show that a few-shot learning model consistently outperformed other learning models, achieving the highest overall accuracy of about 79%. The comparative analysis with previous studies on visual reasoning revealed that previous models showed moderate performance in driving safety tasks, while our proposed model significantly outperformed them. To the best of our knowledge, our proposed MLLM model stands out as the first of its kind, capable of handling multiple tasks for each safety-critical event. It can identify risky scenarios, classify diverse scenes, determine car directions, categorize agents, and recommend the appropriate actions, setting a new standard in safety-critical event management. This study shows the significance of MLLMs in advancing the analysis of naturalistic driving videos to improve safety-critical event detection and understanding the interactions in complex environments.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/59cca77ca8e9458e97393f33ac986fce Zobrazit plný text záznamu View record in DOAJ