Popis: |
This paper presents an identity-invariant facial expression recognition framework. It aims to make a facial expression recognition (FER) model independently understand facial expressions and identity (ID) attributes such as gender, age, and skin, which are entangled in face images. The learned representations of the FER model pursue robustness against unseen ID samples with large attribute differences. Specifically, attribute properties describing (facial) images are retrieved through a powerful pre-trained model, i.e., CLIP. Then, expression features and ID features are realized through residual module(s). As a result, the features learn expression-efficient and ID-invariant representations based on mutual information. The proposed framework is compatible with various backbones, and enables detachment/attachment of ID attributes and ablative analysis. Extensive experiments for several wild Valence-Arousal domain databsets showed the performance improvement of maximum 9% compared to the runner up, and also demonstrated the subjective realism of ID-invariant representation in high-dimensional image space. |