BMST-Net: bidirectional multi-scale spatiotemporal network for salient object detection in videos.

Autor: Sharma, Gaurav, Singh, Maheep, Kumain, Sandeep Chand, Kumar, Kamal
Zdroj: Signal, Image & Video Processing; Jan2025, Vol. 19 Issue 1, p1-9, 9p
Abstrakt: Video saliency prediction aims to simulate human visual attention by locating the most pertinent and instructive areas within a video frame or sequence. While ignoring the audio aspect, time and space data are essential when measuring video saliency, especially with challenging factors like swift motion, changeable background, and nonrigid deformation. Additionally, video saliency detection is inappropriate when using image saliency models directly neglecting video temporal information. This paper suggests a novel Bidirectional Multi-scale SpatioTemporal Network (BMST-Net) for identifying prominent video objects to address the above problem. The BMST-Net yields notable results for any given frame sequence, employing an encoder and decoder technique to learn and map features over time and space. The BMST-Net model consists of bidirectional LSTM (Long Short-Term Memory) and CNN (Convolutional Neural Network), where the VGG16 (Visual Geometry Group) single layer is used for feature extraction of the input video frames. Our proposed approach produced noteworthy findings concerning qualitative and quantitative investigation of the publicly available challenging video datasets, achieving competitive performance concerning state-of-the-art saliency models. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index