Exploiting Acoustic and Lexical Properties of Phonemes to Recognize Valence from Speech
Autor: | Biqiao Zhang, Emily Mower Provost, Soheil Khorram |
---|---|
Rok vydání: | 2019 |
Předmět: |
Improvisation
Speech Acoustics Computer science Speech recognition 0103 physical sciences 0202 electrical engineering electronic engineering information engineering 020206 networking & telecommunications 02 engineering and technology Valence (psychology) 010301 acoustics 01 natural sciences Convolutional neural network |
Zdroj: | ICASSP |
DOI: | 10.1109/icassp.2019.8683190 |
Popis: | Emotions modulate speech acoustics as well as language. The latter influences the sequences of phonemes that are produced, which in turn further modulate the acoustics. Therefore, phonemes impact emotion recognition in two ways: (1) they introduce an additional source of variability in speech signals and (2) they provide information about the emotion expressed in speech content. Previous work in speech emotion recognition has considered (1) or (2), individually. In this paper, we investigate how we can jointly consider both factors to improve the prediction of emotional valence (positive vs. negative), and the relationship between improved prediction and the emotion elicitation process (e.g., fixed script, improvisation, natural interaction). We present a network that exploits both the acoustic and the lexical properties of phonetic information using multi-stage fusion. Our results on the IEMOCAP and MSP-Improv datasets show that our approach outperforms systems that either do not consider the influence of phonetic information or that only consider a single aspect of this influence. |
Databáze: | OpenAIRE |
Externí odkaz: |