Exploiting Acoustic and Lexical Properties of Phonemes to Recognize Valence from Speech

Autor: Biqiao Zhang, Emily Mower Provost, Soheil Khorram
Rok vydání: 2019
Předmět:
Zdroj: ICASSP
DOI: 10.1109/icassp.2019.8683190
Popis: Emotions modulate speech acoustics as well as language. The latter influences the sequences of phonemes that are produced, which in turn further modulate the acoustics. Therefore, phonemes impact emotion recognition in two ways: (1) they introduce an additional source of variability in speech signals and (2) they provide information about the emotion expressed in speech content. Previous work in speech emotion recognition has considered (1) or (2), individually. In this paper, we investigate how we can jointly consider both factors to improve the prediction of emotional valence (positive vs. negative), and the relationship between improved prediction and the emotion elicitation process (e.g., fixed script, improvisation, natural interaction). We present a network that exploits both the acoustic and the lexical properties of phonetic information using multi-stage fusion. Our results on the IEMOCAP and MSP-Improv datasets show that our approach outperforms systems that either do not consider the influence of phonetic information or that only consider a single aspect of this influence.
Databáze: OpenAIRE