MusicFace: Music-driven expressive singing face synthesis
Autor: | Pengfei Liu, Wenjin Deng, Hengda Li, Jintai Wang, Yinglin Zheng, Yiwei Ding, Xiaohu Guo, Ming Zeng |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2023 |
Předmět: | |
Zdroj: | Computational Visual Media, Vol 10, Iss 1, Pp 119-136 (2023) |
Druh dokumentu: | article |
ISSN: | 2096-0433 2096-0662 |
DOI: | 10.1007/s41095-023-0343-7 |
Popis: | Abstract It remains an interesting and challenging problem to synthesize a vivid and realistic singing face driven by music. In this paper, we present a method for this task with natural motions for the lips, facial expression, head pose, and eyes. Due to the coupling of mixed information for the human voice and backing music in common music audio signals, we design a decouple-and-fuse strategy to tackle the challenge. We first decompose the input music audio into a human voice stream and a backing music stream. Due to the implicit and complicated correlation between the two-stream input signals and the dynamics of the facial expressions, head motions, and eye states, we model their relationship with an attention scheme, where the effects of the two streams are fused seamlessly. Furthermore, to improve the expressivenes of the generated results, we decompose head movement generation in terms of speed and direction, and decompose eye state generation into short-term blinking and long-term eye closing, modeling them separately. We have also built a novel dataset, SingingFace, to support training and evaluation of models for this task, including future work on this topic. Extensive experiments and a user study show that our proposed method is capable of synthesizing vivid singing faces, qualitatively and quantitatively better than the prior state-of-the-art. |
Databáze: | Directory of Open Access Journals |
Externí odkaz: |