Robust emotion recognition from speech: Gamma tone features and models

Autor: R. Nagakrishnan, C. Jeyalakshmi, A. Revathi, N. Sasikaladevi
Rok vydání: 2018
Předmět:
Zdroj: International Journal of Speech Technology. 21:723-739
ISSN: 1572-8110
1381-2416
Popis: Affective computing is gaining paramount importance in ensuring the better and effective human–machine interaction. As glottal and speech signals depict the characteristics of the emotional nature of the speaker in addition to the linguistic information, speaker’s emotions are needed to be recognised to give meaningful response by the system. This paper emphasises the effectiveness and efficiency in selecting the energy features by passing the speech through the Gamma tone filters spaced in Equivalent rectangular bandwidth (ERB), MEL and BARK scale. Various modelling techniques are used to develop the robust multi-speaker independent speaker’s emotion/stress recognition system. Since EMO-DB Berlin database and SAVEE emotional audio-visual database used in this work contain the only limited set of speech utterances uttered by 10/4 actors/speakers in different emotions, it has become challenging to improve the performance of the stress/emotion recognition system. Speaker independent emotion recognition is done by extracting the Gamma tone energy features and cepstral features by passing the concatenated speech considered for training through the Gamma tone filters spaced in ERB, MEL and BARK scales. Subsequently, VQ/Fuzzy clustering models and continuous density hidden Markov models are created for all emotions and evaluation is done with the utterances of a speaker independent of speeches considered for training. The proposed features for test utterances are captured and applied to the VQ/Fuzzy/MHMM/SVM models and testing is performed by using minimum distance criterion/maximum log-likelihood criterion. The proposed Gamma tone energy/cepstral features and modelling techniques provide complementary evidence in assessing the performance of the system. This algorithm offers 96%, 79%, and 95.3% as weighted accuracy recall for the stress recognition system with respect to the classification done on emotion-specific group VQ/Fuzzy/MHMM/SVM models for GTF energy features with Gamma tone filters spaced in ERB, MEL and BARK scale respectively for the system evaluated for the EMO-DB database. Weighted accuracy recall is found to be 91%, 93% and 94% for the classification done on emotion-specific group models for GTF energy features with Gamma tone filters spaced in ERB, MEL and BARK scale respectively for the evaluation done on the utterances chosen from the SAVEE database. Gamma tone Cepstral features provide the overall accuracy of 92%, 90% and 92% for filters spaced in ERB, MEL and BARK scale for Berlin EMO-DB. Decision level fusion classification based on GTF energy features and modelling techniques provides the overall accuracy as 99.8% for EO-DB database and 100% for SAVEE database.
Databáze: OpenAIRE