An audio-visual approach to simultaneous-speaker speech recognition

Autor:	Eric Patterson, John N. Gowdy
Rok vydání:	2004
Předmět:	Audio mining Voice activity detection Computer science Speech recognition Speech coding Speech technology Acoustic model Speech corpus Speech synthesis Viseme Intelligibility (communication) VoxForge Linear predictive coding computer.software_genre Speaker recognition Speech processing Background noise Noise Source separation Speech analytics computer
Zdroj:	ICASSP (5)
DOI:	10.1109/icassp.2003.1200087
Popis:	Audio-visual speech recognition is an area with great potential to help solve challenging problems in speech processing. Difficulties due to background noise are significantly reduced by the additional information provided by extra visual features. The presence of additional speech from other talkers during recording may be viewed as one of the most difficult sources of noise. The paper presents a study using audio-visual speech recognition for simultaneous-speaker speech recognition. The desired goal is to separate and potentially recognize speech from several simultaneous speakers. Speaker pairs from the CUAVE multimodal speech corpus (see http://ece.clemson.edu/speech) are used. Audio-visual techniques are compared against speaker-independent and speaker-dependent audio-only methods for speech recognition of individuals from these pairs.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::4b6e5930d57b6464c2e865094cb38e61 https://doi.org/10.1109/icassp.2003.1200087 Zobrazit plný text záznamu