Triphone based unit selection for concatenative visual speech synthesis

Autor:	Eric Cosatto, Hans Peter Graf, Fu Jie Huang
Rok vydání:	2002
Předmět:	Voice activity detection Computer science Speech recognition ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION Speech synthesis Speech processing computer.software_genre Triphone computer
Zdroj:	ICASSP
DOI:	10.1109/icassp.2002.5745033
Popis:	Concatenative visual speech synthesis selects frames from a large recorded video database of mouth shapes to generate photo-realistic talking head sequences. The synthesized sequence must exhibit precise lip-sound synchronization and smooth articulation. The selection process for finding the best lip shapes has been computationally expensive [1], limiting the speed of the synthesis to far less than real time. In this paper, we propose a rapid unit selection approach based on triphone units. Experiments show that this algorithm can make the synthesis, excluding the rendering, 50 times faster than real-time on a standard desktop PC. We also developed a metric to test the quality of the synthesis objectively, and show that this measurement is consistent with subjective measurement results.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::f7cab4b84af90d1669e1435a72088ee4 https://doi.org/10.1109/icassp.2002.5745033 Zobrazit plný text záznamu