LSTM Based Lip Reading Approach for Devanagiri Script
Autor: | Mahesh S. Patil, Priyanka M Nabapure, Anand S. Meti, Satyadhyan Chickerur, Sunaina Mahindrakar, Soumya Kanyal, Sonali Naik |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
Computer science
media_common.quotation_subject Speech recognition Feature extraction Informótica lstm lcsh:QA75.5-76.95 Task (project management) Reading (process) lip-reading General Environmental Science media_common Hindi business.industry Deep learning feature extraction General Engineering Computing deep learning Computación language.human_language machine learning Devanagari language General Earth and Planetary Sciences Data set (IBM mainframe) Artificial intelligence lcsh:Electronic computers. Computer science Paragraph business Information Technology |
Zdroj: | Advances in Distributed Computing and Artificial Intelligence Journal, Vol 8, Iss 3, Pp 13-26 (2020) GREDOS: Repositorio Institucional de la Universidad de Salamanca Universidad de Salamanca (USAL) GREDOS. Repositorio Institucional de la Universidad de Salamanca instname |
ISSN: | 2255-2863 |
Popis: | Speech Communication in a noisy environment is a difficult and challenging task. Many professionals work in noisy environments like aviation, constructions, or manufacturing, and find it difficult to communicate orally. Such noisy environments need an automated lip-reading system that could be helpful in communicating some instructions and commands. This paper proposes a novel lip-reading solution, which extracts the geometrical shape of lip movement from the video and predicts the words/sentences spoken. An Indian specific language data set is developed which consists of lip movement information captured from 50 persons. This includes students in the age group of 18 to 20 years and faculty in the age group of 25 to 40 years . All have spoken a paragraph of 58 words within 10 sentences in Hindi (Devanagari, spoken in India) language which was recorded under various conditions. The implementation consists of facial parts detection, along with Long short term memory’s. The proposed solution is able to predict the words spoken with 77% and 35% accuracy for data set of 3 and 10 words respectively. The sentences are predicted with 20% accuracy, which is encouraging. |
Databáze: | OpenAIRE |
Externí odkaz: |