Deep Learning Framework for Speech Emotion Classification: A Survey of the State-of-the-Art

Autor: Samson Akinpelu, Serestina Viriri
Jazyk: angličtina
Rok vydání: 2024
Předmět:
Zdroj: IEEE Access, Vol 12, Pp 152152-152182 (2024)
Druh dokumentu: article
ISSN: 2169-3536
DOI: 10.1109/ACCESS.2024.3474553
Popis: The intricate landscape of speech emotion classification poses a captivating yet challenging realm due to emotions being fundamental to human communication. In recent years, deep learning frameworks have emerged as powerful tools, shedding light on the elusive domain of emotion recognition, revolutionizing human-computer interactions, and enhancing the emotional intelligence of artificial intelligence (AI). This survey embarks on an exploratory journey into the forefront of deep learning approaches dedicated to speech emotion classification. Deep learning has become the standard approach due to the scarcity of extensive speech corpora and the need for high accuracy at low computational cost. The reason lies in its potency to extract important emotional features from large or medium-sized spectrogram images. Deep learning has been applied to speech emotion classification by many researchers, leading to significant improvements in performance and accuracy. Modern deep learning methods designed for human auditory speech emotion classification are carefully examined in this work. A thorough examination of various deep learning framework designs used in emotion classification is provided, illuminating unique characteristics that capture essential features from speech signals for accurate emotion prediction. The research critically analyzes selected deep models using well-established emotion corpora, highlighting their effectiveness. This research analyses typical performance evaluation metrics used to evaluate speech emotion classification models. With this review, we hope to offer a comprehensive overview of the state-of-the-art, potential directions for further investigation, and developing approaches that further the field of speech emotion classification with deep learning frameworks.
Databáze: Directory of Open Access Journals