A Data Driven Approach to Audiovisual Speech Mapping

Autor:	Amir Hussain, Peter Derleth, Jon Barker, Roger Watt, Andrew Abel, Ricard Marxer, Bill Whitmer
Přispěvatelé:	Liu, Cheng-Lin, Hussain, Amir, Luo, Bin, Tan, Kay Chen, Zeng, Yi, Zhang, Zhaoxiang
Rok vydání:	2016
Předmět:	Speech Acoustics Computer science Speech recognition Frame (networking) Speech technology Speech corpus 02 engineering and technology Viseme Speech processing Filter bank 01 natural sciences Data-driven 0103 physical sciences 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing 010301 acoustics
Zdroj:	Advances in Brain Inspired Cognitive Systems ISBN: 9783319496849 BICS
ISSN:	0302-9743
DOI:	10.1007/978-3-319-49685-6_30
Popis:	The concept of using visual information as part of audio speech processing has been of significant recent interest. This paper presents a data driven approach that considers estimating audio speech acoustics using only temporal visual information without considering linguistic features such as phonemes and visemes. Audio (log filterbank) and visual (2D-DCT) features are extracted, and various configurations of MLP and datasets are used to identify optimal results, showing that given a sequence of prior visual frames an equivalent reasonably accurate audio frame estimation can be mapped.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::81e3e46d0b9c80de1bf0dae683168727 https://doi.org/10.1007/978-3-319-49685-6_30 Zobrazit plný text záznamu