Concept-guided multi-level attention network for image emotion recognition.

Autor: Yang, Hansen, Fan, Yangyu, Lv, Guoyun, Liu, Shiya, Guo, Zhe
Zdroj: Signal, Image & Video Processing; Jul2024, Vol. 18 Issue 5, p4313-4326, 14p
Abstrakt: Image emotion recognition aims to predict people's emotional response toward visual stimuli. Recently, emotional region discovery has become hot topic in this field because it brings significant improvement for the task. Existing studies mainly discover emotional region by sophisticated analyzing from object aspect, which is less discriminative for emotion. In this paper, we propose a Concept-guided Multi-level Attention Network (CMANet) that makes full use of attribute aspect concept to enhance image emotion recognition. To leverage multiple concepts to guide the mining of emotional region, CMANet is designed as multi-level architecture, in which attended semantic feature is firstly calculated under the guidance of feature from holistic branch. Subsequently, with the obtained attended semantic feature, emotional region of feature map in local branch can be attended on. And then, an adaptive fusion method is proposed to achieve complementation of both attended visual and semantic features. Notably, for the emotion categories that are easily to be confused, a novel variable weight cross-entropy loss which enables the model to focus on the hard samples, is proposed to improve the performance of the task. Experiments on several affective image datasets prove that the proposed method is effective and superior to the state-of-the-art methods. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index