Speech emotion recognition using data augmentation
Autor: | V. M. Praseetha, P. P. Joby |
---|---|
Rok vydání: | 2021 |
Předmět: |
Linguistics and Language
Computer science business.industry Speech recognition Deep learning Feature vector SIGNAL (programming language) Feature extraction Filter bank Language and Linguistics Call centre Human-Computer Interaction Preprocessor Robot Computer Vision and Pattern Recognition Artificial intelligence business Software |
Zdroj: | International Journal of Speech Technology. 25:783-792 |
ISSN: | 1572-8110 1381-2416 |
DOI: | 10.1007/s10772-021-09883-3 |
Popis: | Humans are considered as emotional beings and so the uttered speech reflect the human emotions. Human computer interaction can be done more effectively by automatically identifying the emotions from speech. Automatic speech emotion recognition is applied in many areas like computer gaming, call centre, speech therapy controlling robots etc. Emotion recognition can be considered as feature space to label space mapping. From the uttered speech, the different features are calculated. Then, to automatically recognize the emotions, the relationship between the emotions and the features are learned. The required preprocessing is done with the collected training samples and the features are extracted from the speech signals. The extracted feature vectors are stored in the database. When the input signal comes, the preprocessing and feature extraction are done and the extracted features are compared with the feature vectors in the database to determine the emotion in that speech signal. We have developed a deep learning model for speech emotion recognition with GRU which take the filterbank energies of the speech signals as input. To overcome the problem with the availability of database and to increase the number of input samples, we have applied data augmentation. |
Databáze: | OpenAIRE |
Externí odkaz: |