Speech-Driven 3D Facial Animation with Implicit Emotional Awareness: A Deep Learning Approach

Autor:	Hai Xuan Pham, Samuel Cheung, Vladimir Pavlovic
Rok vydání:	2017
Předmět:	Facial motion capture Computer science business.industry Speech recognition Deep learning 020207 software engineering 02 engineering and technology Animation Recurrent neural network 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Artificial intelligence Hidden Markov model business Computer facial animation
Zdroj:	CVPR Workshops
DOI:	10.1109/cvprw.2017.287
Popis:	We introduce a long short-term memory recurrent neural network (LSTM-RNN) approach for real-time facial animation, which automatically estimates head rotation and facial action unit activations of a speaker from just her speech. Specifically, the time-varying contextual non-linear mapping between audio stream and visual facial movements is realized by training a LSTM neural network on a large audio-visual data corpus. In this work, we extract a set of acoustic features from input audio, including Mel-scaled spectrogram, Mel frequency cepstral coefficients and chromagram that can effectively represent both contextual progression and emotional intensity of the speech. Output facial movements are characterized by 3D rotation and blending expression weights of a blendshape model, which can be used directly for animation. Thus, even though our model does not explicitly predict the affective states of the target speaker, her emotional manifestation is recreated via expression weights of the face model. Experiments on an evaluation dataset of different speakers across a wide range of affective states demonstrate promising results of our approach in real-time speech-driven facial animation.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::b7ee5760fde320d98d786bcf36fa3eba https://doi.org/10.1109/cvprw.2017.287 Zobrazit plný text záznamu