Text2Video: Text-driven Talking-head Video Synthesis with Personalized Phoneme-Pose Dictionary

Autor:	Zhang, Sibo, Yuan, Jiahong, Liao, Miao, Zhang, Liangjun
Rok vydání:	2021
Předmět:	Computer Science - Computer Vision and Pattern Recognition Electrical Engineering and Systems Science - Image and Video Processing
Druh dokumentu:	Working Paper
Popis:	With the advance of deep learning technology, automatic video generation from audio or text has become an emerging and promising research topic. In this paper, we present a novel approach to synthesize video from the text. The method builds a phoneme-pose dictionary and trains a generative adversarial network (GAN) to generate video from interpolated phoneme poses. Compared to audio-driven video generation algorithms, our approach has a number of advantages: 1) It only needs a fraction of the training data used by an audio-driven approach; 2) It is more flexible and not subject to vulnerability due to speaker variation; 3) It significantly reduces the preprocessing, training and inference time. We perform extensive experiments to compare the proposed method with state-of-the-art talking face generation methods on a benchmark dataset and datasets of our own. The results demonstrate the effectiveness and superiority of our approach. Comment: ICASSP 2022
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2104.14631 Zobrazit plný text záznamu View this record from Arxiv