Popis: |
Commercial automatic speech recognition (ASR) started to appear in the late 1980?s and can offer a more natural means of accepting user inputs than methods such as typing on keyboards or touch screens. This is a particularly important consideration for small consumer devices such as smartphones. In many practical situations, however, performance of ASR can be significantly compromised due to ambient noise and variable lighting conditions. Previous research has shown that adding visual cues to standard ASR can mitigate the effects of ambient noise. However, audiovisual (AV) ASR is not robust against variable lighting conditions, which are often encountered by users of consumer devices. Since thermal imaging is invariant to changing lighting conditions, the authors propose a trimodal thermal-audiovisual (TAV) ASR using adaptations of established techniques such as MT, DCT and MFCC. Experimental results demonstrate the robustness of this approach over a range of signal-to-noise ratios: tri-modal TAV recognition rates were +39.2% over audio-only ASR and +11.8% over AVASR recognition rates The authors believe that robust ASR will lead to improved user experiences. |