Facial Expression Recognition Based on Vision Transformer with Hybrid Local Attention

Autor: Yuan Tian, Jingxuan Zhu, Huang Yao, Di Chen
Jazyk: angličtina
Rok vydání: 2024
Předmět:
Zdroj: Applied Sciences, Vol 14, Iss 15, p 6471 (2024)
Druh dokumentu: article
ISSN: 2076-3417
DOI: 10.3390/app14156471
Popis: Facial expression recognition has wide application prospects in many occasions. Due to the complexity and variability of facial expressions, facial expression recognition has become a very challenging research topic. This paper proposes a Vision Transformer expression recognition method based on hybrid local attention (HLA-ViT). The network adopts a dual-stream structure. One stream extracts the hybrid local features and the other stream extracts the global contextual features. These two streams constitute a global–local fusion attention. The hybrid local attention module is proposed to enhance the network’s robustness to face occlusion and head pose variations. The convolutional neural network is combined with the hybrid local attention module to obtain feature maps with local prominent information. Robust features are then captured by the ViT from the global perspective of the visual sequence context. Finally, the decision-level fusion mechanism fuses the expression features with local prominent information, adding complementary information to enhance the network’s recognition performance and robustness against interference factors such as occlusion and head posture changes in natural scenes. Extensive experiments demonstrate that our HLA-ViT network achieves an excellent performance with 90.45% on RAF-DB, 90.13% on FERPlus, and 65.07% on AffectNet.
Databáze: Directory of Open Access Journals