Multimodal Lip-Reading for Tracheostomy Patients in the Greek Language

Autor: Yorghos Voutos, Georgios Drakopoulos, Georgios Chrysovitsiotis, Zoi Zachou, Dimitris Kikidis, Efthymios Kyrodimos, Themis Exarchos
Jazyk: angličtina
Rok vydání: 2022
Předmět:
Zdroj: Computers, Vol 11, Iss 3, p 34 (2022)
Druh dokumentu: article
ISSN: 2073-431X
DOI: 10.3390/computers11030034
Popis: Voice loss constitutes a crucial disorder which is highly associated with social isolation. The use of multimodal information sources, such as, audiovisual information, is crucial since it can lead to the development of straightforward personalized word prediction models which can reproduce the patient’s original voice. In this work we designed a multimodal approach based on audiovisual information from patients before loss-of-voice to develop a system for automated lip-reading in the Greek language. Data pre-processing methods, such as, lip-segmentation and frame-level sampling techniques were used to enhance the quality of the imaging data. Audio information was incorporated in the model to automatically annotate sets of frames as words. Recurrent neural networks were trained on four different video recordings to develop a robust word prediction model. The model was able to correctly identify test words in different time frames with 95% accuracy. To our knowledge, this is the first word prediction model that is trained to recognize words from video recordings in the Greek language.
Databáze: Directory of Open Access Journals