Robust tri-modal automatic speech recognition for consumer applications

Autor:	Jie Tang, S. J. Anderson, Alvis C. M. Fong
Rok vydání:	2013
Předmět:	Engineering Modal business.industry Robustness (computer science) Speech recognition Ambient noise level Media Technology Discrete cosine transform Mel-frequency cepstrum Electrical and Electronic Engineering business Sensory cue Speaker adaptation
Zdroj:	IEEE Transactions on Consumer Electronics. 59:352-360
ISSN:	0098-3063
Popis:	Commercial automatic speech recognition (ASR) started to appear in the late 1980?s and can offer a more natural means of accepting user inputs than methods such as typing on keyboards or touch screens. This is a particularly important consideration for small consumer devices such as smartphones. In many practical situations, however, performance of ASR can be significantly compromised due to ambient noise and variable lighting conditions. Previous research has shown that adding visual cues to standard ASR can mitigate the effects of ambient noise. However, audiovisual (AV) ASR is not robust against variable lighting conditions, which are often encountered by users of consumer devices. Since thermal imaging is invariant to changing lighting conditions, the authors propose a trimodal thermal-audiovisual (TAV) ASR using adaptations of established techniques such as MT, DCT and MFCC. Experimental results demonstrate the robustness of this approach over a range of signal-to-noise ratios: tri-modal TAV recognition rates were +39.2% over audio-only ASR and +11.8% over AVASR recognition rates The authors believe that robust ASR will lead to improved user experiences.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::b28aa2ae17c5cf920711862fdcd4a3c2 https://doi.org/10.1109/tce.2013.6531117 Zobrazit plný text záznamu