Popis: |
Emotion recognition plays a crucial role in understanding human behaviour and improving human–computer interaction. However, the conventional methods were failed to detect the multiple emotions from image, speech, and video data. So, this study proposes a comprehensive approach for emotion recognition from multiple modalities, including face, speech, and video data, which is named as Emotion-Net. In the preprocessing stage, the input data from face, speech, and video sources is preprocessed to enhance the quality and consistency of the information. Various techniques such as noise removal, normalization, and alignment are applied to ensure reliable feature extraction. Then, the Histogram of Oriented Gradients (HOG) technique is employed to capture important visual patterns and spatial information from facial images, video frames, and speech data. It provides a compact and descriptive representation of facial features, enabling effective emotion recognition. To select the most informative and discriminative features, Effective Seeker Optimization Algorithm (ESOA) feature selection algorithm is employed. The ESOA used to select the best features from HOG using similarity selection process. Finally, the Hidden Markov Convolutional Neural Network (HMCNN) classifier is utilized for emotion recognition. This classifier combines the strengths of Hidden Markov Models (HMMs) and CNNs, allowing the model to capture temporal dynamics and spatial dependencies in the input ESOA features. Experimental evaluations show that, the proposed Emotion-Net achieves high performance in recognizing different emotions from face, speech, and video data. |