Popis: |
Automatically estimating a user’s emotional behaviour via speech contents and facial expressions plays a crucial role in the development of intelligent human-computer interaction systems. Thus, considerable efforts have been made to develop emotion-aware systems to be useful in real-life applications. However, several main challenges still remain and need to be tackled. For example, hand-engineered features are not effective or discriminative to represent the emotional contents from the raw audio or video inputs. Likewise, conventional deep learning structures and models have not taken the characteristics of emotions into account and thus need to be adjusted. Furthermore, continual emotion perception and empathic behaviour analysis have not been investigated so far, which however is highly related when implemented into real-life products. To deal with these challenges, this thesis proposes and presents a set of representation learning approaches and emotion modelling frameworks. In particular, for the representation learning task, with the advent of deep learning techniques, data-driven representation learning approaches are introduced, aiming to learn discriminative, context-aware, largely modal-invariant features to represent the emotional states. Further, various novel deep network structures are conceived and investigated to enhance present emotion recognition systems. More precisely, this is achieved either by incorporating the strengths of different sub-networks, or by exploiting the disagreement level of the annotations as difficulty indicators. Alternatively, models can be trained jointly with heterogeneous data, or grasp additional knowledge through adversarial learning. Extensive experiments conducted with various spontaneous emotional datasets demonstrate that these introduced methods are superior to the current state-of-the-art methods in both dimensional emotion regression and categorical emotion classification tasks. Moreover, this thesis sheds light on how to deploy deep learning techniques to effectively address lifelong emotional recognition and automatic empathy detection issues. |