Abstrakt: |
The efficient speech emotion recognition (SER) is a crucial component in the development of natural and intuitive human-computer interaction systems. The problem addressed in this research is the challenge of achieving high accuracy in SER through optimized feature extraction, selection, and model tuning. The objective is to achieve more accurate speech emotion recognition through optimized feature extraction and selection, and the generation of an optimized random forest model. This paper presents a machine learning framework for speech emotion recognition based on Metaheuristic principles. The framework comprises two primary components: feature extraction and selection. Additionally, it generates an optimised random forest model by utilising the hybrid bat algorithm (HBA) to fine-tune the hyperparameters of the RF model, known as the Bat Random Forest Hybrid Meta Heuristic Algorithm (BRHAMO). Three speech corpora, including RAVDESS, SAVEE, and novel ANAKE Hindi speech corpus, have been utilized. The speech emotion recognition was subjected to a series of experiments and tests utilizing two unique features of speech characteristics, namely spectral features and the amalgamation of hybrid features. The experimental findings demonstrated BRHAMO based model achieved an accuracy of 81%, 79.6%, and 77.6% for the RAVDESS, SAVEE, and ANAKE datasets, respectively, in the spectral feature category. Furthermore, for the hybrid feature category, the RAVDESS, SAVEE, and ANAKE datasets achieved accuracy rates of 93.8%, 85.4%, and 89.8%, respectively. The performance of the BRHAMO, has been compared to several benchmark machine learning models, namely vanilla Random Forest, gradient boost, adaptive boost, and support vector machines. It is observed that the Meta Heuristic Algorithm (MHA) based approach can deliver better performance in terms of accuracy, precision, F1 score, and recall compared to all the individual classifiers in both categories. [ABSTRACT FROM AUTHOR] |