Visual Speech Recognition: A Deep Learning Approach

Autor:	Varsha Patil, Kavita Hegde, Anand Ramesh, Navin Kumar Mudaliar
Rok vydání:	2020
Předmět:	Computer science business.industry media_common.quotation_subject Deep learning Speech recognition 020207 software engineering Context (language use) 02 engineering and technology Convolutional neural network Task (project management) Data set Body language Reading (process) 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Artificial intelligence business Encoder media_common
Zdroj:	2020 5th International Conference on Communication and Electronics Systems (ICCES).
Popis:	A machine being able to perform lip-reading would have been deemed impossible a few decades ago. However, the exponential growth of machine learning in the past few years has made it possible for a machine to understand human speech based on visual inputs alone. Numerous research studies infer that a very less percentage of the English language can be comprehended through visual data alone, i.e. lip reading. Visual speech recognition experts can only infer about 3–4% of words spoken through lip-reading after viewing videos (without audio) multiple times. These experts also examine other parameters such as body language, facial cues, habits, and context to some extent. This task is very tedious (or exhausting). The proposed visual speech recognition approach has used the concept of deep learning to perform word-level classification. ResNet architecture is used with 3D convolution layers as the encoder and Gated Recurrent Units (GRU) as the decoder. The whole video sequence was used as an input in this approach. The results of the proposed approach are satisfactory. It achieves 90% accuracy on the BBC data set and 88% on the custom video data set. The proposed approach is limited to word-level only and can easily be extended to short phrases or sentences.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::04de789e23a55f882e41deb3a4fb8980 https://doi.org/10.1109/icces48766.2020.9137926 Zobrazit plný text záznamu