Anomaly Detection in Weakly Supervised Videos Using Multistage Graphs and General Deep Learning Based Spatial-Temporal Feature Enhancement
Autor: | Jungpil Shin, Yuta Kaneko, Abu Saleh Musa Miah, Najmul Hassan, Satoshi Nishimura |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2024 |
Předmět: |
Temporal contextual aggregation (TCA)
uncertainty-regulated dual memory units (UR-DMU) graph convolutional networks and global/local multi-head self-attention (GL-MHSA) weakly supervised video anomaly detection (WS-VAD) anomaly detection Electrical engineering. Electronics. Nuclear engineering TK1-9971 |
Zdroj: | IEEE Access, Vol 12, Pp 65213-65227 (2024) |
Druh dokumentu: | article |
ISSN: | 2169-3536 |
DOI: | 10.1109/ACCESS.2024.3395329 |
Popis: | Weakly supervised video anomaly detection (WS-VAD) is a crucial research domain in computer vision for the implementation of intelligent surveillance systems. Many researchers have been working to develop WS-VAD systems using various technologies by assessing anomaly scores. However, they are still facing challenges because of lacking effective feature extraction. To mitigate this limitation, we propose a multi-stage deep-learning model for separating abnormal events from normality to extract the hierarchical effective features. In the first stage, we extract two stream features using pre-trained techniques: the first stream employs a ViT-based CLIP module to select top-k features, while the second stream utilizes a CNN-based I3D module integrated into the Temporal Contextual Aggregation (TCA) mechanism. These features are concatenated and fed into the second-stage module, where an Uncertainty-regulated Dual Memory Units (UR-DMU) model is employed to learn representations of regular and abnormal data simultaneously. The UR-DMU integrates global and local structures, leveraging Graph Convolutional Networks (GCN) and Global and Local Multi-Head Self Attention (GL-MHSA) modules to capture video associations. Subsequently, feature reduction is achieved using the multilayer-perceptron (MLP) integration with the Prompt-Enhanced Learning (PEL) module via the knowledge-based prompt. Finally, we employed a classifier module to predict the snippet-level anomaly scores. In the training phase, the based function transfers the snippet-level scores into bag-level predictions for learning high activation in anomalous cases. Our approach integrates these cutting-edge technologies and methodologies, offering a comprehensive solution to video-based anomaly detection. Extensive experiments on ShanghaiTech, XD-Violence, and UCF-Crime datasets validate the superiority of our method over state-of-the-art approaches by a substantial margin. We believe that our model holds significant promise for real-world applications, demonstrating superior performance and efficacy in anomaly detection tasks. |
Databáze: | Directory of Open Access Journals |
Externí odkaz: |