Exploring Attention Mechanisms for Multimodal Emotion Recognition in an Emergency Call Center Corpus
Autor: | Deschamps-Berger, Théo, Lamel, Lori, Devillers, Laurence |
---|---|
Rok vydání: | 2023 |
Předmět: | |
Zdroj: | Published in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
Druh dokumentu: | Working Paper |
DOI: | 10.1109/ICASSP49357.2023.10096112 |
Popis: | The emotion detection technology to enhance human decision-making is an important research issue for real-world applications, but real-life emotion datasets are relatively rare and small. The experiments conducted in this paper use the CEMO, which was collected in a French emergency call center. Two pre-trained models based on speech and text were fine-tuned for speech emotion recognition. Using pre-trained Transformer encoders mitigates our data's limited and sparse nature. This paper explores the different fusion strategies of these modality-specific models. In particular, fusions with and without cross-attention mechanisms were tested to gather the most relevant information from both the speech and text encoders. We show that multimodal fusion brings an absolute gain of 4-9% with respect to either single modality and that the Symmetric multi-headed cross-attention mechanism performed better than late classical fusion approaches. Our experiments also suggest that for the real-life CEMO corpus, the audio component encodes more emotive information than the textual one. Comment: 5 pages, 2 figures, 4 tables |
Databáze: | arXiv |
Externí odkaz: |