Plain-to-clear speech video conversion for enhanced intelligibility

Autor:	Shubam Sachdeva, Haoyao Ruan, Ghassan Hamarneh, Dawn M. Behne, Allard Jongman, Joan A. Sereno, Yue Wang
Rok vydání:	2023
Předmět:	Human-Computer Interaction Linguistics and Language Computer Vision and Pattern Recognition Language and Linguistics Software
Zdroj:	International Journal of Speech Technology
ISSN:	1572-8110 1381-2416
DOI:	10.1007/s10772-023-10018-z
Popis:	Clearly articulated speech, relative to plain-style speech, has been shown to improve intelligibility. We examine if visible speech cues in video only can be systematically modified to enhance clear-speech visual features and improve intelligibility. We extract clear-speech visual features of English words varying in vowels produced by multiple male and female talkers. Via a frame-by-frame image-warping based video generation method with a controllable parameter (displacement factor), we apply the extracted clear-speech visual features to videos of plain speech to synthesize clear speech videos. We evaluate the generated videos using a robust, state of the art AI Lip Reader as well as human intelligibility testing. The contributions of this study are: (1) we successfully extract relevant visual cues for video modifications across speech styles, and have achieved enhanced intelligibility for AI; (2) this work suggests that universal talker-independent clear-speech features may be utilized to modify any talker’s visual speech style; (3) we introduce “displacement factor” as a way of systematically scaling the magnitude of displacement modifications between speech styles; and (4) the high definition generated videos make them ideal candidates for human-centric intelligibility and perceptual training studies.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::7ca5ed1997dda58514fab040c28fcb8b https://doi.org/10.1007/s10772-023-10018-z Zobrazit plný text záznamu Full text from SpringerLink