Deep learning approaches for speech emotion recognition: state of the art and research challenges
Autor: | Faiqa Hanif, Ghulam Mujtaba, Rashid Jahangir, Ying Wah Teh |
---|---|
Rok vydání: | 2021 |
Předmět: |
Computer Networks and Communications
business.industry Computer science Process (engineering) Deep learning Feature extraction 020207 software engineering 02 engineering and technology Machine learning computer.software_genre Speech processing Field (computer science) Variety (cybernetics) Discriminative model Hardware and Architecture 0202 electrical engineering electronic engineering information engineering Media Technology Artificial intelligence business computer Software Human voice |
Zdroj: | Multimedia Tools and Applications. 80:23745-23812 |
ISSN: | 1573-7721 1380-7501 |
DOI: | 10.1007/s11042-020-09874-7 |
Popis: | Speech emotion recognition (SER) systems identify emotions from the human voice in the areas of smart healthcare, driving a vehicle, call centers, automatic translation systems, and human-machine interaction. In the classical SER process, discriminative acoustic feature extraction is the most important and challenging step because discriminative features influence the classifier performance and decrease the computational time. Nonetheless, current handcrafted acoustic features suffer from limited capability and accuracy in constructing a SER system for real-time implementation. Therefore, to overcome the limitations of handcrafted features, in recent years, variety of deep learning techniques have been proposed and employed for automatic feature extraction in the field of emotion prediction from speech signals. However, to the best of our knowledge, there is no in-depth review study is available that critically appraises and summarizes the existing deep learning techniques with their strengths and weaknesses for SER. Hence, this study aims to present a comprehensive review of deep learning techniques, uniqueness, benefits and their limitations for SER. Moreover, this review study also presents speech processing techniques, performance measures and publicly available emotional speech databases. Furthermore, this review also discusses the significance of the findings of the primary studies. Finally, it also presents open research issues and challenges that need significant research efforts and enhancements in the field of SER systems. |
Databáze: | OpenAIRE |
Externí odkaz: |