Abstrakt: |
Salient Object Detection (SOD) is a crucial task within the domain of digital image processing which aims to detect objects in images or videos that attract special human attention. These visually attentive objects are referred as salient objects in computer vision and image processing. The automatic recognition of these attention-grabbing objects holds considerable importance for various applications such as video summarization, automated cropping for compression purposes, image and video captioning, and action recognition. In the last two decades, various methods have been proposed by the research community to mimic the human visual capability to find the object(s) that receives the most attention. Early methodologies primarily relied on conventional approaches, but more recently, deep learning-based techniques have gained significant interest and popularity in the domain of salient object detection in images and videos. In this work, the authors introduce an innovative model that employs a dual-stream encoder–decoder architecture for accurate saliency estimation in videos. Integrating an attention mechanism and non-local blocks makes the network more robust, leading to improved identification of salient objects. To assess the proposed model's effectiveness, comprehensive evaluations have been conducted on well-known publicly available datasets such as VOS, DAVSOD, and ViSAL. The experimental results demonstrate that the proposed model achieves competitive performance when compared to state-of-the-art methods on S-Measure, F-Measure, and MAE performance evaluation metrics. [ABSTRACT FROM AUTHOR] |