Excitation Features of Speech for Emotion Recognition Using Neutral Speech as Reference
Autor: | Paavo Alku, P. Gangamohan, Sudarsana Reddy Kadiri, Suryakanth V. Gangashetty, B. Yegnanarayana |
---|---|
Přispěvatelé: | Dept Signal Process and Acoust, Koneru Lakshmaiah Education Foundation, International Institute of Information Technology Hyderabad, Aalto-yliopisto, Aalto University |
Rok vydání: | 2020 |
Předmět: |
Speech production
Binary decision diagram Computer science Applied Mathematics Speech recognition media_common.quotation_subject 020206 networking & telecommunications 02 engineering and technology Fundamental frequency 16. Peace & justice Sadness 030507 speech-language pathology & audiology 03 medical and health sciences Computer Science::Sound Signal Processing Modulation (music) 0202 electrical engineering electronic engineering information engineering Mel-frequency cepstrum 0305 other medical science Prosody Energy (signal processing) media_common |
Zdroj: | Circuits, Systems, and Signal Processing. 39:4459-4481 |
ISSN: | 1531-5878 0278-081X |
DOI: | 10.1007/s00034-020-01377-y |
Popis: | In generation of emotional speech, there are deviations in the speech production features when compared to neutral (non-emotional) speech. The objective of this study is to capture the deviations in features related to the excitation component of speech and to develop a system for automatic recognition of emotions based on these deviations. The emotions considered in this study are anger, happiness, sadness and neutral state. The study shows that there are useful features in the deviations of the excitation features, which can be exploited to develop an emotion recognition system. The excitation features used in this study are the instantaneous fundamental frequency ($$F_0$$F0), the strength of excitation, the energy of excitation and the ratio of the high-frequency to low-frequency band energy ($$\beta $$β). A hierarchical binary decision tree approach is used to develop an emotion recognition system with neutral speech as reference. The recognition experiments showed that the excitation features are comparable or better than the existing prosody features and spectral features, such as mel-frequency cepstral coefficients, perceptual linear predictive coefficients and modulation spectral features. |
Databáze: | OpenAIRE |
Externí odkaz: |